Blueprint Epigenome - RA2 Epigenetic variation and use for improved diagnostics

As a fundamental step in understanding the causes of epigenetic variation we will generate data of key epigenetic modifications in collections of two blood cell types from 100 healthy individuals. These measurements will be combined with whole genome sequencing and transcriptome sequencing in order to dissect the interplay between common DNA sequence variation and the epigenome. This will allow us to estimate the degree by which certain modifications at the cellular level, primarily driven by epigenetic modifications, find their causes at the level of the heritable genome (i.e. DNA) and what fraction of cellular function and variation is driven by transient or stable non-genetic effects. This will provide a unique framework to understand the interplay between the genetic code and environment and how the latter contributes to complex diseases and will ultimately show the direction of prevention and medical intervention that needs to be adopted to achieve better health.

BLUEPRINT also aims to quantify the variation in epigenomes using inbred strains of mice, an excellent animal model for interrogating the genotype-epigenotype-phenotype relationship. The advantage of using inbred strains of mice is the ability to control mating conditions between mice of identical DNA sequence, thus providing genetic homogeneity in experimental cells, and, where appropriate, experimental F1 hybrids of controlled genotype. BLUEPRINT will conduct the first study that combines RNA-seq, DNA methylation and hydroxymethylation profiling with ChIP-seq for seven further epigenetic features. We will interrogate the epigenomes of three strains of mouse whose sequence is completed (C57BL/6J, C3H/HeJ and CAST/EiJ) using two easily purified cell types directly isolated from mouse peripheral blood. These data will be validated, and then the extent to which differences in epigenotype correlate with sequence variation will be quantified and related to the transcriptome. In these defined genetic backgrounds, these data will inform the human datasets by determining how much variation is genotype-dependent, how much may be caused by non-genetic components and how much is functional. As the chosen cell types are comparable to those analysed from human, inter-species comparisons in the data-sets can be made. The mouse model programme will also provide an opportunity to identify any autosomal differences between males and females and the extent of parental origin effects, for example those that might lie outside known imprinted regions.

To better understand possible roles that epigenomes play in disease aetiology, BLUEPRINT will carry out the first comprehensive epigenome-wide association study (EWAS) of any human disease. As exemplar disease, we have chosen Type 1 Diabetes (T1D) because certain blood cells (e.g. CD14+ monocytes) are causally implicated in T1D aetiology and because BLUEPRINT partners have already conducted a successful pilot study demonstrating the involvement of Methylation variable positions (MVPs), the epigenetic equivalent of Single Nucleotide Polymorphisms (SNP’s), in T1D. The study will make direct use of the healthy reference epigenomes and be conducted in two phases: a discovery phase, involving monozygotic twins discordant for T1D (which will exclude confounding genetic effects) and a validation phase, involving prospectively sampled singletons (which will further exclude twinning and pharmacological (insulin) treatment effects). If successful, BLUEPRINT has the potential to establish EWAS in the same way as the Wellcome Trust Case Control Consortium established GWAS as a powerful approach to study the association between SNPs, MVPs and the complex clinical phenotype of this devastating chronic autoimmune disease with a year-on-year increase in prevalence.

To foster the clinical relevance of epigenetic analysis, BLUEPRINT also includes a major biomarker component. Biomarker development will focus specifically on DNA methylation, in order to maximize compatibility with clinical diagnostics and to capitalize on the clinical sequencing infrastructure that is becoming available in European countries. We will build on the reference epigenomes generated in RA1 and DNA methylomes from ICGC, confirm these results in large patient cohorts for ALL and AML and develop cost-efficient high-throughput assays for the most promising biomarker candidates. The consortium will also provide the expertise and infrastructure to enable biomarker development in patient cohorts that have been collected as part of large European clinical trials. To maintain the flexibility to follow up on the most clinically relevant discoveries from the entire BLUEPRINT project, we will implement a mini-proposal review scheme for allocating resources for biomarker development.

Research Area Leader: Stephan Beck

DNA methylation variation in T1DM	Leader: David Leslie
Biomarker development	Leader: Christoph Bock
The effect of common sequence variation on the epigenome landscape	Leader: Nicole Soranzo
Mouse models to quantify variation in reference epigenomes	Leader: Anne Ferguson-Smith