Standardising access to LSDB data helps researchers provide up-to-date information on sequence variation between individuals. This increases our understanding of the relationship between DNA variations and disease.
As the human genome project has completed, the collection and study of all sequence variation between individuals is of increasing importance to understand the relationship between DNA variations and disease.
Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). While over 1200 of these LSDBs exist online, sharing information or combining data of these databases is extremely hard if they are not software-based or use custom software.
VarioML is a data integration solution that creates a layer of meaning above the many variation standards and formats, enabling them to be integrated and made sense of as a whole.
VarioML is not 'Yet Another Data Standard', but instead a way to make sense of the different formats already used in the lab and clinic. There is no need to learn another standard. Whatever vocabularies of mutation data you are working with, VarioML will help you resolve the difficulties of merging with other data in other formats.
A Locus-specific Database (LSDB) describes the variants discovered on a single gene or members of a gene family and other related functional elements. LSDBs are curated by experts on their respective loci, and as such are typically the best resources of such information available. But LSDBs vary widely in format and completeness, making data integration and exchange among them difficult and time-consuming. To address these difficulties, the VarioML format has been developed for the full range of variation data use-cases, providing semantically well-defined components which can be easily composed to fit specific needs.
Using VarioML, data owners can now efficiently enable the integration, federation, and exchange of their variant data. The discoverabiliaty, extensibility, and quality of variation data is immediately enhanced. Critical new avenues of research and knowledge discovery are opened, as data using the VarioML standard can be integrated with the global library of purely genetic data.
VarioML is a central prerequisite for effective modelling of phenotype data and genotype-to-phenotype relationships. It removes the long-standing technical obstacles to the effective passing of variant data from discovery laboratories into the biomedical database world. Now all that is needed is the broad participation of the genotype-to-phenotype research community.