Methylation of cytosine bases in DNA is considered a major epigenetic
hallmark that has important implications for biological processes and human
diseases. Several high-throughput methods for genome-wide profiling of DNA
methylation have been developed. The 450K DNA methylation (Infinium
HumanMethylation450 BeadChip) array provides the methylation status of more than
480,000 cytosines distributed throughout the genome. The platform relies on
hybridization of bisulphite-treated DNA fragments to bead-bound probes that
provide an indication of the methylation status of each targeted CpG site on
the array. However, it has been observed that factors other than methylation
changes can alter hybridization. These include genetic variants such as single
nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs) and
repetitive regions of DNA. Prior to hybridization, the genome is bisulphite
treated, converting all unmethylated cytosines to uracil (and subsequently to
thyamine). This reduction in genome complexity means that many probes on the
array no longer map to unique locations. These factors have the potential to
give rise to misleading (false positive and negative) methylation calls when
using this array. Currently, there is no clear method or pipeline for detecting
which of the probes on the 450k array should be used for the subsequent
analysis in light of these hybridization problems. Exclusion of these affected
probes in the data processing procedure can substantially improve the
methylation estimates. We propose a method that enables the identification of
affected probes on the 450K arrays. Based on 8 prostate cancer samples (4
benign and 4 tumour) we show that our method significantly reduces the risk of
false discoveries.