• No se han encontrado resultados

The essentials of DUO codes


Academic year: 2023

Share "The essentials of DUO codes"


Texto completo


The essentials of DUO codes

European Genome/Phenome Archive (EGA)

#VEISris3cat #FEDERrecerca #FonsEUCat

Aina Jené

[email protected]


The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health.

What is GA4GH?



● Data Security

● Regulatory & Ethics

● Clinical & Phenotypic Data Capture

● Cloud

● Data Use & Researcher Identities (DURI)

● Discovery

● Genomic Knowledge Standards

● Large Scale Genomics


The Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed in 2013 to accelerate the potential of research and medicine to advance human health.

What is GA4GH?



● Data Security

● Regulatory & Ethics

● Clinical & Phenotypic Data Capture

● Cloud

Data Use & Researcher Identities (DURI)

● Discovery

● Genomic Knowledge



Data Use & Researcher Identities

They aim to create standards required to facilitate researcher identity and data use.

Proposed solutions:

• Establish researcher identities (e.g. ORCID)

• Specify a data use ontology

Approved deliverables:

• Data Use Ontology

• Machine-readable Consents

• GA4GH Passports

What is DURI workstream?


The GA4GH DUO includes terms that describe data use conditions, particularly for research data in health, clinical, and biomedical domains.

Allow to semantically tag datasets with restriction about their usage, making them automatically discoverable based on the authorization level of users, or intended usage.

• Projects that have implemented DUO codes:

• the Broad Institute’s

Data Use Oversight System, DUOS

the EGA

the Data Information System, DAISY

What are DUO codes?


• Three evolving efforts to standardize data use restrictions in the biomedical and genomics research domains.

• NIH’s dbGaP data use categories

• Consent Codes

• Automated Data Access Matrix (ADA-M)

How did DUO codes come about?


List of DUO codes


General Research Use (GRU)

Health / Medical / Biomedical (HMB)

Disease - specific (DS)

Populations, Origins, and Ancestry (POA)

List of DUO codes - permissions

This data use permission indicates that use is allowed for general research use for any research purpose.

This data use permission indicates that use is allowed for health/medical/biomedical purposes;

does not include the study of population origins or ancestry.

This data use permission indicates that use is allowed provided it is related to the specified disease.

This data use permission indicates that use of the data is limited to the study of population origins or ancestry.

+ MONDO term


DUO step by step

1 2

3 4

1. Consent form annotation

2. Dataset annotation

3. Dataset discovery

4. Data access request


1. DUO provides a shared understanding of the meaning of data use categories.

2. DUO is distributed as a machine-readable file.

3. DUO can be implemented alongside an advanced search algorithm.

Why are DUO codes useful?




Implementation to the EGA schema


Implementation to the EGA schema


Implementation to the EGA schema

Scenario 1

(all datasets under 1 policy. Hence under the same DUO codes


Scenario 2 (Different datasets separated by DUO codes collections.

Hence, different policies)


How are DUO codes added to the EGA?

EGA Helpdesk

DUO codes?


How can DUO codes be used at EGA?


How can DUO codes be used at EGA?


How can DUO codes be used at EGA?

Programmatic submission:

XML structure



alias="ena-POLICY-BABRAHAM-23-03-2017-09:47:38:853-62"center_name="BABRAHAM"accession="EGAP000010006 15"broker_name="EGA">






<TITLE>Data Access Agreement for PCHiC, RNA-Seq, ChIP-Seq</TITLE>

<DAC_REF accession="EGAC00001000523">







<DATA_USE ontology="DUO"code="0000007"version="17-07-2016">










<DATA_USE ontology="DUO"code="0000014"version="17-07-2016"/>




How can DUO codes be used at EGA?


Links of interest:

• Github repository

• Data Use Ontology approved as a GA4GH technical standard

• EGA DUO documentation

• CINECA, Powering up data discovery and access using the Data Use Ontology

• DUO ontology


• Consent Codes: Upholding Standard Data Use Conditions

• The Data Use Ontology to streamline responsible access to human biomedical datasets

• Responsible sharing of biomedical data and biospecimens via the

“Automatable Discovery and Access Matrix” (ADA-M)

GA4GH Webinar:

• Using the GA4GH toolkit: Data Use Ontology for automating access to human

genomic data

More information










Documento similar