BLUEPRINT
The BLUEPRINT Project is a high impact FP7 aiming to produce a blueprint of haemopoetic epigenomes. Our goal is to apply highly sophisticated functional genomics analysis on a clearly defined set of primarily human samples from healthy and diseased individuals, and to provide at least 100 reference epigenomes to the scientific community. This resource-generating activity will be complemented by research into blood-based diseases, including common leukaemias and autoimmune disease (Type 1 Diabetes), by discovery and validation of epigenetic markers for diagnostic use and by epigenetic target identification.This may eventually lead to the development of novel and more individualised medical treatments.
Data Reuse
The Blueprint consortium expects this data to be valuable to other researchers. In keeping with Fort Lauderdale principles, data users may use the data for many studies, but are expected to allow the data producers to make the first presentations and to publish the first paper with global analyses of the data. Our full data reuse statement can be found on our website.
Data access
Our data is available both from the sequence archives and from our own ftp site and data mining and browsing tools.
Requirement | Access |
---|---|
Raw data | Data archives (EGA & ENA) |
Processed data | FTP site |
Data mining | BioMart |
Genome browser | Genomatix browser |
BLUEPRINT Track Hub on the UCSC browser | |
BLUEPRINT Track Hub on the Ensembl browser |
Raw Data
The majority of samples sequenced by Blueprint are consented for release via a managed access system. To facilitate this we have archived the data in the EGA. Users can apply to download data. The process for this can be found on our DAC applications page. Data for samples that do not require managed access have been archived with the ENA. In each case, links to the raw data can be found through the experiment grid.
Processed Data
The alignments generated for our sequence data are also available from the EGA. All processed data
types are available from our ftp
site. The main types we make available are defined here:
Experiment Type | Data Type | File Format | Example |
---|---|---|---|
RNA-Seq | Quantification | GTF | C0010K Monocyte transcript quantification |
RNA-Seq | Alignment Signal | BigWig | C0010K Monocyte plus strand signal |
ChIP-Seq | Peak Calls | BigBed | C0010K Monocyte H3K4me1 peak calls |
ChIP-Seq | Alignment Signal | BigWig | C0010K Monocyte H3K4me1 signal |
DNase1-Seq | Hotspots | BigBed | C0010K Monocyte Dnase hotspots |
DNase1-Seq | Alignment Signal | BigWig | C0010K Monocyte Dnase signal |
WGS Bisulphite Seq | Hypo-methylated Regions | BigBed | C0010K Monocyte hypo methylation calls |
WGS Bisulphite Seq | Hyper-methylated Regions | BigBed | C0010K Monocyte hyper methylation calls |
WGS Bisulphite Seq | Alignment Signal | BigWig | C0010K Monocyte methylation call signal |
Secondary analysis
Secondary analysis results are made available as part of the data release cycle. The methods and how to access the results are listed on the secondary analysis page.
FTP Site
The FTP site has 3 major sections listed here and described in more detail
below:
- data : This directory contains all the processed data files described in the above table
- release : This directory contains files specific to a particular release, such as meta data and indexes
- reference : This directory contain reference data sets used for our analysis, e.g. a GENCODE gene set or reference assembly
Data
The data directory contains all processed data files as described in the above table. New files will be added each release. Subdirectories are organised by species, tissue type, donor, cell type and data type. The filename format follows this general form:
sample_name.experiment.algorithm_or_pipeline_name.freeze_date.file_extension
e.g.
C002TWH1.H3K4me3.bwa_filtered.20130415.bw
The freeze date in the filename should match the first freeze in which the file was produced.
Release
The most recent release can be found at current_release. Each release directory contain an index file (list of all files for the specific release), description of analysis pipeline, Track Hub directory and a readme file describing the current index file.
A description for the data index and each analysis pipeline can be found here:
Reference
The reference directory contains the reference materials used for our analysis pipelines. The subdirectories are also dated to allow us to update our reference files. The analysis pipeline readmes should indicate which reference files were used for a particular analysis pipeline.
The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement [no 282510 - BLUEPRINT]
Blueprint is part of the International Human Epigenome Consortium