BLUEPRINT

BLUEPRINT Logo

The BLUEPRINT Project is a high impact FP7 aiming to produce a blueprint of haemopoetic epigenomes. Our goal is to apply highly sophisticated functional genomics analysis on a clearly defined set of primarily human samples from healthy and diseased individuals, and to provide at least 100 reference epigenomes to the scientific community. This resource-generating activity will be complemented by research into blood-based diseases, including common leukaemias and autoimmune disease (Type 1 Diabetes), by discovery and validation of epigenetic markers for diagnostic use and by epigenetic target identification.This may eventually lead to the development of novel and more individualised medical treatments.

Data Reuse

The Blueprint consortium expects this data to be valuable to other researchers. In keeping with Fort Lauderdale principles, data users may use the data for many studies, but are expected to allow the data producers to make the first presentations and to publish the first paper with global analyses of the data. Our full data reuse statement can be found on our website.

Data access

Our data is available both from the sequence archives and from our own ftp site and data mining and browsing tools.

Requirement	Access
Raw data	Data archives (EGA & ENA)
Processed data	FTP site
Data mining	BioMart
Genome browser	Genomatix browser
	BLUEPRINT Track Hub on the UCSC browser
	BLUEPRINT Track Hub on the Ensembl browser

Raw Data

The majority of samples sequenced by Blueprint are consented for release via a managed access system. To facilitate this we have archived the data in the EGA. Users can apply to download data. The process for this can be found on our DAC applications page. Data for samples that do not require managed access have been archived with the ENA. In each case, links to the raw data can be found through the experiment grid.

Processed Data

The alignments generated for our sequence data are also available from the EGA. All processed data types are available from our ftp site. The main types we make available are defined here:

Experiment Type	Data Type	File Format	Example
RNA-Seq	Quantification	GTF	C0010K Monocyte transcript quantification
RNA-Seq	Alignment Signal	BigWig	C0010K Monocyte plus strand signal
ChIP-Seq	Peak Calls	BigBed	C0010K Monocyte H3K4me1 peak calls
ChIP-Seq	Alignment Signal	BigWig	C0010K Monocyte H3K4me1 signal
DNase1-Seq	Hotspots	BigBed	C0010K Monocyte Dnase hotspots
DNase1-Seq	Alignment Signal	BigWig	C0010K Monocyte Dnase signal
WGS Bisulphite Seq	Hypo-methylated Regions	BigBed	C0010K Monocyte hypo methylation calls
WGS Bisulphite Seq	Hyper-methylated Regions	BigBed	C0010K Monocyte hyper methylation calls
WGS Bisulphite Seq	Alignment Signal	BigWig	C0010K Monocyte methylation call signal

Secondary analysis

Secondary analysis results are made available as part of the data release cycle. The methods and how to access the results are listed on the secondary analysis page.

FTP Site

The FTP site has 3 major sections listed here and described in more detail below:

data : This directory contains all the processed data files described in the above table
release : This directory contains files specific to a particular release, such as meta data and indexes
reference : This directory contain reference data sets used for our analysis, e.g. a GENCODE gene set or reference assembly

Data

The data directory contains all processed data files as described in the above table. New files will be added each release. Subdirectories are organised by species, tissue type, donor, cell type and data type. The filename format follows this general form:

sample_name.experiment.algorithm_or_pipeline_name.freeze_date.file_extension

e.g.

C002TWH1.H3K4me3.bwa_filtered.20130415.bw

The freeze date in the filename should match the first freeze in which the file was produced.

Release

The most recent release can be found at current_release. Each release directory contain an index file (list of all files for the specific release), description of analysis pipeline, Track Hub directory and a readme file describing the current index file.

A description for the data index and each analysis pipeline can be found here:

Reference

The reference directory contains the reference materials used for our analysis pipelines. The subdirectories are also dated to allow us to update our reference files. The analysis pipeline readmes should indicate which reference files were used for a particular analysis pipeline.

The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement [no 282510 - BLUEPRINT]

Blueprint is part of the International Human Epigenome Consortium