NextGen
NextGen is a collaborative research project investigating the biodiversity of livestock species. NextGen used whole genome sequencing and genotyping microarrays to catalogue and study variation within populations of cattle, sheep and goat.
NextGen project genomic data sets are publicly available:
- Cattle, sheep and goats sampled
- Whole genome sequencing for hundreds of samples
- Genotyping microarrays for hundreds more samples
- Domesticated and wild-type ancestral species
- Traditional breeds and industrial breeds
- Samples from Iran, Morocco and Uganda
- De novo genome assemblies of wild species
WGS Variation Data
Population | Species | Country | Samples | Variant Discovery | Variants | Genus level SNPs |
---|---|---|---|---|---|---|
IRBT | Bos taurus (cattle) | Iran | 9 | vcf | browse | vep | 20.1 M | vcf | browse | vep |
UGBT | Bos taurus × indicus (cattle) | Uganda | 25 | vcf | browse | vep | 29.3 M | vcf | browse | vep |
IROA | Ovis aries (sheep) | Iran | 20 | vcf | browse | vep | 25.8 M | vcf | browse | vep |
IROO | Ovis orientalis (mouflon) | Iran | 19 | vcf | browse | vep | 29.3 M | vcf | browse | vep |
MODA | Ovis aries (sheep) | Morocco | 160 | vcf | browse | vep | 38.5 M | vcf | browse | vep |
ISGC | Ovis aries (sheep) | (various)1 | 75 | vcf | browse | vep | ||
IROV | Ovis vignei (urial) | Iran | 4 | vcf | browse | vep | ||
IRCA | Capra aegagrus (bezoar) | Iran | 22 | vcf | vep | 17.4 M | vcf | vep |
IRCH | Capra hircus (goat) | Iran | 20 | vcf | vep | 22.9 M | vcf | vep |
MOCH | Capra hircus (goat) | Morocco | 161 | vcf | vep | 31.8 M | vcf | vep |
AUCH | Capra hircus (goat) | Australia2 | 5 | vcf | vep | ||
AUFR | Capra hircus (goat) | France | 4 | vcf | vep | ||
ITCH | Capra hircus (goat) | Italy3 | 5 | vcf | vep |
- Variant Discovery
- All SNPs and indels discovered from whole genome sequencing that passed the NextGen quality filters.
- Genus level SNPs
- Genotypes re-called at all known SNP sites within the genus. Intended for cross-species and cross-population comparisons.
- VEP
- Variant call sets were annotated with the Variant Effect Predictor.
1. [Seventy-five sheep from many different countries were were sequenced by the International Sheep Genomics Consortium (ISGC) and analysed by NextGen]←
2. [Five Australian Capra hircus were sequenced by CSIRO and analysed by NextGen]←
3. [Five Italian Capra hircus were sequenced by the project Genhome and were contributed to the Nextgen project by PTP and IBBA-CNR]←
Microarray Data
Species | Country | Array | Samples | Genotypes |
---|---|---|---|---|
Bos taurus × indicus (cattle) | Uganda | bovineHD | 102 | vcf | browse |
Bos taurus × indicus (cattle) | Uganda | bovineSNP50 | 813 | vcf | browse |
Ovis orientalis (mouflon) | Iran | ovineSNP50 | 8 | vcf | browse |
Ovis aries (sheep) | Iran | ovineSNP50 | 18 | vcf | browse |
Ovis aries (sheep) | Morocco | ovineSNP50 | 30 | vcf | browse |
Capra aegagrus (bezoar) | Iran | goatSNP50 | 7 | vcf |
Capra hircus (goat) | Iran | goatSNP50 | 9 | vcf |
Capra hircus (goat) | Morocco | goatSNP50 | 30 | vcf |
De novo assemblies
Ovis orientalis | Capra aegagrus | |
---|---|---|
Species | mouflon (wild sheep) | bezoar (wild goat) |
Assembly name | Oori1 | Caeg1 |
ENA project | PRJEB3141 | PRJEB3140 |
Assembled scaffolds | fasta | repeat mask fasta | fasta | repeat mask fasta |
Number of Scaffolds | 6,173 | 6,616 |
Total scaffold length | 2.59 Gbp | 2.58 Gbp |
Scaffold N50 | 2.21 Mbp | 1.75 Mbp |
Genomic alignment to domestic species | vs sheep OARv3.1 | vs goat CHIR1.0 |
More assembly information | Readme | Readme |
More genomic alignment information | Readme | Readme |
- Assembled using Allpaths-LG
- Genomic alignments to the domestic species used Ensembl pairwise alignment analysis
Available Data
The European Nucleotide Archive contains:
- Fastq: Whole genome Illumina sequencing.
- Bam: Reads aligned to the reference assemblies, with indel realignment and marked duplicates.
- Assemblies: De novo genomes of mouflon and bezoar.
The European Variation Archive contains:
- Variant call sets: From WGS and microarray data
- File formats: vcf and ped/map formats, with VEP annotation
The NextGen ftp site contains multifarious data, including:
- Sample indexes: Including animals' location, breed, age.
- Documentation: E.g. how to find data, and NextGen's data processing methods.
- Photographs: Images of sampled animals.
- Links to archived data: Fastq, bam, genome assemblies, variant calls
- Preprocessed data: E.g. unfiltered WGS variant calls, raw microarray data.
Data Reuse
Please observe NextGen's policy on data reuse.
The research leading to these results received funding from the European Union's Seventh Framework Programme (FP7/2010-2014) under grant agreement no 244356 - "NextGen".