Neandertal Genome Browser

Neandertal Genome Browser Logo

The Ensembl Neandertal Genome Browser was a project hosted until June 2023 when support was discontinued. The site displayed data from six Neandertal individuals against a human genome scaffold of NCBI36. Its data is available from our FTP site as described below.

History

The Neandertal Genome Project has sequenced six samples from members of Homo sapiens neanderthalensis. Almost 98% of the sequence comes from three specimens from the Vindija Cave in Croatia; most of the remainder comes from an individual from Mezmaiskaya in the Altai Mountains, Russia, with tiny fractions (0.1%) from the species type specimen (Neander valley, Germany) and a fossil found in El Sidron cave in Asturias, Spain. To put the Neandertal sequences in perspective, the project also sequenced five modern humans, Homo sapiens sapiens, from Southern Africa, Western Africa, Papua New Guinea, China and Europe.

The Neandertal sequences were mapped to the human reference genome (NCBI36), the chimpanzee genome, and an ancestral sequence extrapolated from a 4-way EPO alignment between human, chimp, orangutan and macaque, using a custom alignment program that takes into account the characteristics of ancient DNA.

FTP Site

The neandertal FTP site hosts three types of files:

Data production

The following section describes the steps towards creation of the hosted analysis files.

Neandertal Sequencing reads

Neandertal sequence was generated from 6 Neandertal fossils: Vi33.16 (54.1% genome coverage), Vi33.25 (46.6%) and Vi33.26 (45.2%) were discovered in the Vindija cave in Croatia, Feld1 (0.1%) is from the Neandertal type specimen from the Neander Valley in Germany, Sid1253 (0.1%) is from El Sidron cave in Asturias, Spain, and Mez1 (2%) is from Mezmaiskaya in the Altai Mountains, Russia.

To increase the fraction of endogenous Neandertal DNA in our sequencing libraries, we used restriction enzymes to deplete libraries of microbial DNA.

Sequencing was carried out on the 454 FLX and Titanium platforms and the Illumina GA.

Neandertal reads were mapped to the human genome (hg18), chimpanzee genome (pantro2) as well as to the single-copy aligned human and chimpanzee ancestor genome extracted from the 4-way Enredo-Pecan-Ortheus (EPO) alignment {Paten, 2009 #106;Paten, 2008 #107;Paten, 2008 #108} of human, chimpanzee, orangutan, and macaque using a custom mapper called ANFO (http://bioinf.eva.mpg.de/anfo).

This custom alignment program was developed to take the characteristics of ancient DNA into account. Following the observation and implementation by Briggs et al. {Briggs, 2009 #6}, ANFO uses different substitution matrices for DNA thought to be double stranded versus single stranded and changes between them if doing so affords a better score.

Neandertal Contigs/Consensus from all individuals combined

For each library, consensus sequences were constructed from multiple reads of the same Neandertal molecule, defined as having the same orientation, read length, alignment length, and alignment start coordinates. All such clusters, regardless of their mapping quality, are replaced by their consensus sequence. For each observed base and each possible original base, we calculated the likelihood of the observation from its quality score. The base with the highest quality score (calculated by dividing each likelihood by the total likelihood) is used as the consensus.

Modern Human Sequencing reads

To put the divergence of the Neandertal genomes into perspective with regard to present-day humans, we sequenced the genomes of one San from Southern Africa (HGDP01029), one Yoruba from West Africa (HGDP00927), one Papua New Guinean (HGDP00542), one Han Chinese (HGDP00778) and one French (HGDP00521) from Western Europe to 4- to 6-fold coverage on the Illumina GAII platform. These sequences were aligned to the chimpanzee and human reference genomes and analyzed using a similar approach to that used for the Neandertal data.

Selective sweep scan

An approach was devised to detect positive selection in early modern humans that takes advantage of this fact by looking for genomic regions where present-day humans share a common ancestor subsequent to their divergence from Neandertals, and Neandertals therefore lack derived alleles found in present-day humans (except in rare cases of parallel substitutions)

SNPs were identified as positions that vary among the five present-day human genomes of diverse ancestry plus the human reference genome and used the chimpanzee genome to determine the ancestral state (SOM 13). SNPs at CpG sites were ignored since these evolve rapidly and may thus be affected by parallel mutations. We identified 5,615,438 such SNPs, at about 10%of which Neandertals carry the derived allele.

As expected, SNPs with higher frequencies of the derived allele in present-day humans were more likely to show the derived allele in Neandertals. We took advantage of this fact to calculate the expected number of Neandertal derived alleles within a given region of the human genome. The observed numbers of derived alleles were then compared to the expected numbers to identify regions where the Neandertal carries fewer derived alleles than expected relative to the human allelic states.

The 212 regions were ranked with respect to their genetic width in centimorgans since the size of a region affected by a selective sweep will be larger the fewer generations it took for the sweep to reach fixation as fewer recombination events will then have occurred during the sweep. Thus, the more intense the selection that drove a putative sweep, the larger the affected region is expected to be.

The regions are color-coded based on the z-score value / error.

Selective sweep screen Z-scores: min: -5 mid: 0 max: 5
Selective sweep screen Z-score error min: 0 mid: 5 max: 10
Top 5% selective sweep regions min: -10 mid: -7 max: -4

Catalog of changes

This tracks displays Neandertal alleles for positions of non-synonymous difference between human and chimpanzee

We identified, from whole genome alignments, sites where the human genome reference sequence does not match chimpanzee, orangutan and rhesus macaque. These are likely to have changed on the human lineage since the common ancestor with chimpanzee. Where Neandertal fragments overlapped we constructed consensus sequences and joined them into .minicontigs. which were used to determine the Neandertal state at the positions that changed on the human lineage.

To minimize alignment errors and substitutions we disregarded all substitutions and insertions or deletions (indels) within 5 nucleotides of the ends of minicontigs or within 5 nucleotides of indels.

Among 10,535,445 substitutions and 479,863 indels inferred to have occurred on the human lineage, we have information in the Neandertal genome for 3,202,190 and 69,029, i.e. 30% and 14%, respectively.

Publication

Richard E. Green, Johannes Krause, Adrian W. Briggs, Tomislav Maricic, Udo Stenzel, Martin Kircher, Nick Patterson, Heng Li, Weiwei Zhai, Markus Hsi-Yang Fritz, Nancy F. Hansen, Eric Y. Durand, Anna-Sapfo Malaspinas, Jeffrey D. Jensen, Tomas Marques-Bonet, Can Alkan, Kay Prüfer, Matthias Meyer, Hernán A. Burbano, Jeffrey M. Good, Rigo Schultz, Ayinuer Aximu-Petri, Anne Butthof, Barbara Höber, Barbara Höffner, Madlen Siegemund, Antje Weihmann, Chad Nusbaum, Eric S. Lander, Carsten Russ, Nathaniel Novod, Jason Affourtit, Michael Egholm, Christine Verna, Pavao Rudan, Dejana Brajkovic, Željko Kucan, Ivan Gušic, Vladimir B. Doronichev, Liubov V. Golovanova, Carles Lalueza-Fox, Marco de la Rasilla, Javier Fortea, Antonio Rosas, Ralf W. Schmitz, Philip L. F. Johnson, Evan E. Eichler, Daniel Falush, Ewan Birney, James C. Mullikin, Montgomery Slatkin, Rasmus Nielsen, Janet Kelso, Michael Lachmann, David Reich and Svante Pääbo. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014 Jan;505(7481):43-49. DOI: 10.1038/nature12886. PMID: 24352235; PMCID: PMC4031459.

Acknowledgements

The Neandertal Genome Project is based at the department of Evolutionary Genetics at the Max Planck Institute for Evolutionary Anthropology in collaboration with the Neandertal Genome Consortium. The Neandertal Genome Browser used code developed by Ensembl.