Chapter | 1 Curating gene-disease relationships
As the name implies a rare disease affects only very few individuals. However, there are many unique causes of rare diseases, thus many individuals are affected by such a disease. Due to the rarity of each single entity, effective management, surveillance and treatment is challenging. So is finding the correct diagnosis, which is often described as the “diagnostic odyssey”.
Rare diseases often have a genetic cause, making high-throughput sequencing (next-generation sequencing; NGS) a central part of finding the molecular diagnosis.
1.1 Neurodevelopmental disorders
Neurodevelopmental disorders (NDDs) affect about 2% of children. They represent a clinically and genetically extremely heterogeneous disease group comprising, amongst others, developmental delay (DD), intellectual disability (ID) and autism spectrum disorder (ASD) and developmental and epileptic encephalopathies (DEE).
1.2 Genetic heterogeneity
Looking at published gene-disease associations over time reveals significant genetic heterogeneity.
Thus the question arises:
How can we keep track of this fast development and have the information at hand when we need it in the clinic or when analyzing sequencing data?
While the answer to this question is easy:
We need curated databases to catalog and summarize the wealth of published information.
The task at hand is not only laborious but also requires expertise and consistency.
1.3 Expert curation
In our opinion, the curation of gene-disease relationships in rare disease such as NDDs requires clinical and scientific proficiency in the respective field. This implies that clinician scientists involved in counseling, diagnostics and research of NDDs are predestined for this task.
To reduce workload and dependence on single experts, a distributed effort in larger consortia and collaboration between different work groups is needed.
In the course of updating SysID we had the great chance to contribute our data to Orphanet to create a European ID/NDD specific reference list. With support from the „ITHACA Workgroup: intellectual disability“ (id-genes.orphanet.app) in 2019 we started working with the Orphanet team which is part of the Gene Curation Coalition (GenCC).
Additionally, we are able to recruit expert curators from ERN ITHACA to contribute to re-curation of old data and updating new data in SysNDD.
1.4 Technical concepts
In addition to a pool of experts, the right tools are needed.
We defined “gene-inheritance-disease” units as “entities” which represent the central curation effort. The components of these entities are normalized using widely used and standardized ontology terms (e.g. HGNC identifier for genes, OMIM or MONDO for disease and inheritance from HPO). This allows interoperability and linking to other data sources.
Based on this concept we developed a new database scheme, which allows entities to be systematically and reproducibly cataloged. The database is abstracted into a JSON API, which allows structured programmatic access to the underlying data.
Finally, the API feeds the web tool which can be used to easily search, filter, download and visualize the database contents in modern web browsers.
1.5 Outlook
The SysNDD database will improve the understanding and curation of rare NDD entities.
SysNDD will enable systems biology and network analyses.
Our long-term goal is incorporation of the high-quality, manually curated SysNDD data into European and international gene disease relationship databases,
thus, improving diagnostics and care for individuals with rare NDDs.