Comparative profiling of complex protein mixtures with peptide arrays generated from LC-MS mass spectrometry
Advancements in mass spectrometry (MS) instrumentation, liquid chromatography (LC) and maturing protein databases are leading many advances in the field of proteomics. Among the potential uses of this technology is the identification of predictive protein biological markers or biomarkers that can differentiate two or more groups of complex biological samples. Despite its proteome-wide potential few clinically relevant discoveries have come forth from these technologies when applied to complex protein mixtures, such as serum or tissue, characterized by a high complexity and dynamic range. Current approaches to profile proteins are dominated by the use of MALDI or LC-MS/MS mass spectrometry (MS/MS), and both approaches have difficulties in practice; MALDI can identify a large number of "peaks", but identification (sequence) of low abundant features can be difficult, and MS/MS lacks sensitivity and has poor reproducibility and low protein coverage due to its data-dependent sampling. It has been our hypothesis that greater efficiency of protein/peptide profiling could be obtained by more efficient use of high resolution LC-MS instrumentation where, like MALDI approaches, differential peptides are first identified from the list of potential precursor ions (LC-MS) and then those only those differential peptides are sequenced in subsequence LC-MS measurements. To evaluate this hypothesis, our group has developed a suite of software algorithms that produce a peptide array from a sequence of LC-MS measurements; the peptide array can be evaluated in much the same way as a transcript array with members identified by their accurate mass and time tags. Production of the peptide array requires substantial signal (image) processing, image alignment, and specialized normalization routines. We demonstrate that we can identify and compare hundreds or thousands of peptides and proteins across multiple replicates of biological samples. The algorithms will be demonstrated using data of increasingly complex biological samples; bacteria, yeast, and human serum.