Data Usage and Release

Data Usage

We request that any use of data obtained from denovo-db be cited in publications.


denovo-db, Seattle, WA (URL: [date (month, yr) accessed].

Public Data Release

denovo-db is a collection of de novo variants assembled from the literature and was created by Dr. Evan Eichler's lab at the University of Washington.

For each study, an input table is generated containing minimally the sample name, chromosome, position (hg19), reference allele, alternate allele, study name, and orthogonal validation status. When any attribute is not readily identifiable within the paper we email the authors. Any additional information is then added to the input file for the study. Each input file is converted to a GCF file and from all GCF files one composite VCF is made. Using the composite file we check for overlaps between sample/variant combinations. If there are any overlaps we take the one with the best level of validation. For example, if a sample/variant combination is in study1 and study2 and study1 has no validation status and study2 has a validation status of de novo we list the variant as being in study2. Validation trumps all for listing and this helps remove redundancy in the database. Variants that are not valid based on orthogonal testing or are shown to be inherited are removed from the database. The clean composite VCF is then run through a preliminary annotation pipeline using SnpEff and subsequently annotated deeply using SeattleSeqAnnotation138. Annotations for complex alleles are retained from SnpEff and all other annotations are from SeattleSeqAnnotation138. The data is than uploaded into a PostgreSQL database which communicates with a JBoss web app to load data to the front-end denovo-db website.

biorxiv data represented in denovo-db are pulled from a specific version of the paper uploaded to biorxiv. Since the data sometimes changes and is updated on the biorxiv site, users should use the data with caution

Terms of Service

This website is designed to serve as a landing place and dissemination point for human de novo variant data. This data is provided free-of-charge, provided the following permission statement is followed. There may be other information on the site, such as links to other sites, references to other project groups and federal grants. The University of Washington has no responsibility for these links and information. Note: denovo-db does not report sample IDs that are identifiable in any way and only pulls from the published literature. In the event that no ID exists in the paper a simple naming system is given such as LastAuthorNameSampleX where X is any number. If there are identifying links in the published paper those links are the responsibility of the published manuscript, not this database.


The contents of the denovo-db web site are intended for educational or research purposes. Generally, we place no restrictions on the use of the data available from denovo-db. However, the Simons Foundation has requested additional usage restrictions on data from their collection as follows:

The use of Simons Simplex Collection (SSC) and Simons VIP data sets is limited to projects related to advancing the field of autism and related developmental disorder research. Questions on SSC/VIP consents should be directed to

You may download or copy the content and other downloadable items displayed on the denovo-db portion of the web site, provided that in using the data, you follow the citation format given above.

Privacy Terms