Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

ofahimIQSS · 2025-01-10T20:01:06Z

Description
While testing the pull request #11118, an issue was identified with the handling of ROR (Research Organization Registry) identifiers in the Dataverse system. Specifically, the system generates incorrect links when saving and displaying ROR identifiers in the dataset metadata.

This brings to the discussion of how all Identifiers should be handled. Specifically, we need to determine whether it is sufficient to ask for just the Unique Identifier or if it is necessary to request the entire URL for many of these identifier types. Factors to consider include the consistency and accuracy of data, ease of implementation, system compatibility, and user experience. Using only the Unique Identifier might streamline data entry and reduce redundancy, but it could introduce challenges in cases where context or full URL information is required for processing. On the other hand, requiring the entire URL could ensure completeness and facilitate integration with systems that rely on full URLs but may add complexity and potential for errors during data entry. Establishing a clear guideline will help maintain uniformity and efficiency across all identifier types.

Steps to Reproduce
Follow the steps outlined in PR #11118.
Create a new dataset and proceed to the "Author" section:
Under Identifier Type, select ROR.
Enter a valid ROR URL, e.g., https://ror.org/03vek6s52.
Save the dataset.
Navigate to the Metadata tab and click on the displayed ROR URL.
Observed Behavior
The ROR URL redirects to an invalid link:
https://ror.org/https://ror.org/03vek6s52.
This results in a 404 Page Not Found error due to duplication of the domain (ror.org).

Test with only the ROR Identifier (e.g., 03vek6s52):

Enter just the identifier (without the full URL).
Save the dataset and navigate to the Metadata tab.
The ROR Identifier is displayed as plain text and is not hyperlinked.
Expected Behavior
When a valid ROR URL is entered, the metadata tab should display and link to the correct ROR page, e.g., https://ror.org/03vek6s52.
When only the ROR Identifier is provided, the system should construct a valid URL (https://ror.org/{identifier}) and display it as a clickable hyperlink in the metadata tab.

Screen.Recording.2025-01-10.at.3.00.12.PM.mov

qqmyers · 2025-01-10T21:49:29Z

FWIW:

dataverse/src/main/java/edu/harvard/iq/dataverse/ExternalIdentifier.java

Line 17 in 4373753

    
           ROR("ROR", "https://ror.org/%s", "^(https:\\/\\/ror.org\\/)0[a-hj-km-np-tv-z|0-9]{6}[0-9]{2}$");

has a bug in that the https://ror.org/ shouldn't be in both the template and the pattern.

If https://ror.org/ is removed from the pattern, which is what's needed to recognize the non-url form, we should be aware of the impact in fields such as author affiliation or funding agency (places where we don't have a separate id type field.) in code like

dataverse/src/main/java/edu/harvard/iq/dataverse/pidproviders/doi/XmlMetadataTemplate.java

Lines 1531 to 1533 in 4373753

    
           ExternalIdentifier externalIdentifier = ExternalIdentifier.ROR; 
        
           if (externalIdentifier.isValidIdentifier(funder)) { 
        
               isROR = true;

. The external vocab script stores the URL form and expects it to be recognized there. If there's a need to recognize ROR w/o the URL, it might be easiest to have a lax and strict ROR recognizers, but being able to recognize both forms at the same time might be a nice upgrade (for ROR and other IDs).

Not sure what the best approach is, especially given that we're trying to get ext vocab scripts in to avoid typos/variations in identifiers. Just removing the https://ror.org/ from the template instead might be a quick compromise - it would fix the URL form and leave the any non URL entry as a plain string, which would be OK if/when we update to use the script for ROR entry for an author.

jggautier · 2025-01-13T21:42:20Z

In this issue's PR, starting at #11118 (comment), there's more conversation about how this should work.

ofahimIQSS added the Type: Bug a defect label Jan 10, 2025

ofahimIQSS assigned pdurbin Jan 10, 2025

ofahimIQSS mentioned this issue Jan 10, 2025

add ROR as an Author Identifier Type #11118

Open

pdurbin removed their assignment Jan 10, 2025

ofahimIQSS changed the title ~~Incorrect Handling of ROR Identifiers in Dataset Metadata~~ Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

ofahimIQSS commented Jan 10, 2025 •

edited

Loading

qqmyers commented Jan 10, 2025 •

edited

Loading

jggautier commented Jan 13, 2025

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

Incorrect Handling of ROR Identifiers/Unique Identifiers in Dataset Metadata #11149

Comments

ofahimIQSS commented Jan 10, 2025 • edited Loading

qqmyers commented Jan 10, 2025 • edited Loading

jggautier commented Jan 13, 2025

ofahimIQSS commented Jan 10, 2025 •

edited

Loading

qqmyers commented Jan 10, 2025 •

edited

Loading