European Reference Genome Atlas - Pilot Project

European Reference Genome Atlas - Pilot Project Logo

The European Reference Genome Atlas (ERGA) is a pan-European scientific response to current threats to biodiversity. Reference genomes provide the most complete insight into the genetic basis that forms each species and represent a powerful resource in understanding how biodiversity functions. With approximately one fifth of the ~200,000 European species at risk of extinction, we need to act fast and together to generate high-quality complete genome resources in large scale.

The Pilot Project, aims to build a pan-European genomics infrastructure that can support the inclusion and equal participation of each European country, at each step of the genome establishment pipeline, from sample collection to publication.

Ensembl is a partner in the ERGA project, and we annotate the protein-coding and non-coding RNA gene structures using re-engineered versions of our Gene Annotation System (Aken et al, 2017) optimised for vertebrates and for non-vertebrates. When a species lacks transrciptomic data, we run BRAKER2 to generate hint-guided ab initio gene predictions of protein-coding genes, in the default protein mode (see the blog post for more information). After QC, genomes and annotations are made available via our FTP site (see table below) before subsequently being made available in the Ensembl Genome Browser.

Image Species Accession Annotation method Annotation Proteins Transcripts Softmasked genome Repeat library Other data View in browser BUSCO completeness Alternate haplotype
Ammodytes marinus GCA_949987685.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps BUSCO
Argentina silus GCA_951799395.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps
Coenonympha glycerion GCA_963855885.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps
Haliaeetus albicilla GCA_947461875.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA Repeatmodeler FTP dumps BUSCO
Mytilus edulis GCA_963676685.2 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps
Silurus aristotelis GCA_946808225.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA Repeatmodeler FTP dumps BUSCO
Solea solea GCA_958295425.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps BUSCO
Thunnus thynnus GCA_963924715.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps
Trifolium dubium GCA_951804385.1 Ensembl Genebuild GTF, GFF3 FASTA FASTA FASTA FTP dumps