The essentials of DUO codes
European Genome/Phenome Archive (EGA)
#VEISris3cat #FEDERrecerca #FonsEUCat
Aina Jené
[email protected]
• The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health.
What is GA4GH?
WORK STREAMS
FOUNDATIONAL TECHNICAL
● Data Security
● Regulatory & Ethics
● Clinical & Phenotypic Data Capture
● Cloud
● Data Use & Researcher Identities (DURI)
● Discovery
● Genomic Knowledge Standards
● Large Scale Genomics
• The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health.
What is GA4GH?
WORK STREAMS
FOUNDATIONAL TECHNICAL
● Data Security
● Regulatory & Ethics
● Clinical & Phenotypic Data Capture
● Cloud
● Data Use & Researcher Identities (DURI)
● Discovery
● Genomic Knowledge
Standards
Data Use & Researcher Identities
They aim to create standards required to facilitate researcher identity and data use.
• Proposed solutions:
• Establish researcher identities (e.g. ORCID)
• Specify a data use ontology
• Approved deliverables:
• Data Use Ontology
• Machine-readable Consents
• GA4GH Passports
What is DURI workstream?
• The GA4GH DUO includes terms that describe data use conditions, particularly for research data in health, clinical, and biomedical domains.
• Allow to semantically tag datasets with restriction about their usage, making them automatically discoverable based on the authorization level of users, or intended usage.
• Projects that have implemented DUO codes:
• the Broad Institute’s
Data Use Oversight System, DUOS
• the EGA
• the Data Information System, DAISY
What are DUO codes?
• Three evolving efforts to standardize data use restrictions in the biomedical and genomics research domains.
• NIH’s dbGaP data use categories
• Consent Codes
• Automated Data Access Matrix (ADA-M)
How did DUO codes come about?
List of DUO codes
• General Research Use (GRU)
• Health / Medical / Biomedical (HMB)
• Disease - specific (DS)
• Populations, Origins, and Ancestry (POA)
List of DUO codes - permissions
This data use permission indicates that use is allowed for general research use for any research purpose.
This data use permission indicates that use is allowed for health/medical/biomedical purposes;
does not include the study of population origins or ancestry.
This data use permission indicates that use is allowed provided it is related to the specified disease.
This data use permission indicates that use of the data is limited to the study of population origins or ancestry.
+ MONDO term
DUO step by step
1 2
3 4
1. Consent form annotation
2. Dataset annotation
3. Dataset discovery
4. Data access request
1. DUO provides a shared understanding of the meaning of data use categories.
2. DUO is distributed as a machine-readable file.
3. DUO can be implemented alongside an advanced search algorithm.
Why are DUO codes useful?
DUO at EGA
Implementation to the EGA schema
Implementation to the EGA schema
Implementation to the EGA schema
Scenario 1
(all datasets under 1 policy. Hence under the same DUO codes
collection)
Scenario 2 (Different datasets separated by DUO codes collections.
Hence, different policies)
How are DUO codes added to the EGA?
EGA Helpdesk
DUO codes?
How can DUO codes be used at EGA?
How can DUO codes be used at EGA?
How can DUO codes be used at EGA?
Programmatic submission:
XML structure
<POLICY_SET>
<POLICY
alias="ena-POLICY-BABRAHAM-23-03-2017-09:47:38:853-62"center_name="BABRAHAM"accession="EGAP000010006 15"broker_name="EGA">
<IDENTIFIERS>
<PRIMARY_ID>EGAP00001000615</PRIMARY_ID>
<SUBMITTER_ID
namespace="BABRAHAM">ena-POLICY-BABRAHAM-23-03-2017-09:47:38:853-62</SUBMITTER_ID>
</IDENTIFIERS>
<TITLE>Data Access Agreement for PCHiC, RNA-Seq, ChIP-Seq</TITLE>
<DAC_REF accession="EGAC00001000523">
<IDENTIFIERS>
<PRIMARY_ID>EGAC00001000523</PRIMARY_ID>
</IDENTIFIERS>
</DAC_REF>
<POLICY_FILE>ftp://ftp.ebi.ac.uk/pub/contrib/pchic/EGA_Data_Access_Request_DIL.docx</POLICY_FILE>
<DATA_USES>
<DATA_USE ontology="DUO"code="0000007"version="17-07-2016">
<MODIFIER>
<DB>EFO</DB>
<ID>0001645</ID>
</MODIFIER>
<MODIFIER>
<DB>EFO</DB>
<ID>0001655</ID>
</MODIFIER>
</DATA_USE>
<DATA_USE ontology="DUO"code="0000014"version="17-07-2016"/>
</DATA_USES>
</POLICY>