Advanced

Limitation of algorithms for quantitative label-free LC-MS-based proteomics

Eskafi Noughani, Arian (2012) BINP30 20112
Degree Projects in Bioinformatics
Abstract
Abstract

Over the past decade, mass spectrometry has had an essential role in improvement of proteomics techniques. New instrument and analytical protocol developments have made mass spectrometry one of the most widely used analytical methods in quantitative proteomics. Today one of the commonly used protocols in proteomics is digestion of proteins to peptides and analysis of the peptides with liquid chromatography and mass spectrometry (LC-MS), with subsequent identification of the peptides by tandem mass spectrometry (MS/MS). Quantification can be done on MS1 level by summing intensities of detected peaks (precursor intensity) or using counting of the MS/MS spectra (spectral counting). However, improvement in methods increases the... (More)
Abstract

Over the past decade, mass spectrometry has had an essential role in improvement of proteomics techniques. New instrument and analytical protocol developments have made mass spectrometry one of the most widely used analytical methods in quantitative proteomics. Today one of the commonly used protocols in proteomics is digestion of proteins to peptides and analysis of the peptides with liquid chromatography and mass spectrometry (LC-MS), with subsequent identification of the peptides by tandem mass spectrometry (MS/MS). Quantification can be done on MS1 level by summing intensities of detected peaks (precursor intensity) or using counting of the MS/MS spectra (spectral counting). However, improvement in methods increases the amount of data and the data analysis of results from biological experiments involves several difficulties.
Differentiations between peptide quantities can be found by comparing many samples from LC-MS. The comparison needs precise detection of peptides features in LC-MS spectra to allow matching between files. Mass changes in measurements and fluctuations in retention times (RT) make the matching process difficult. In this project we are trying to enable accurate peptide quantification based on LC-MS data, by first comparing and characterizing the pre-requirements for software used in the analysis process.
There are several algorithms used for peptide (feature) extraction from LC-MS data files. The different algorithms yielded different results and one of the challenges start from this point because we need to know which software and which parameter settings that are most reliable for covering many peptides with low false discovery rate. In the current project, MaxQuant is one of the common algorithms in label free quantitative proteomics detected more peptide features than OpenMS and MsInspect.
Technical variation in different experiments has an important role in the results of the experiment, and can be used for assessing the performance of different quantification methods. In the second part of the project, technical variation was compared for analysis based on precursor intensity, spectral counting and total MS/MS intensity for two different datasets. The technical variation in the precursor-based analysis was higher than that achieved using spectral counting or total MS/MS intensity datasets. The results of the current project may be used to select the most appropriate analysis method for protein quantification.

Popular science summary:

Limitation of algorithms for quantitative label-free proteomics

Although whole genome sequencing for many organisms is completed, many genes remain without assigned functions. To understand the function of each gene, new techniques for high throughput data generation have been envisaged. Proteomics is the study of the proteome, which is the collection of different proteins and their levels in a given cell or tissue. By comparing proteomes between samples one can get an idea about the role of different proteins in disease or biological processes.

Proteomics can in this way help doctors and researchers to better diagnose and treat diseases. An aim is to find biological markers for disease to enable diagnosis and prognosis for personalized medicine. However, global protein analysis is difficult to perform, and involves complex protocols. One of the commonly used protocols in proteomics is digestion of proteins to peptides and analysis of peptides with liquid chromatography and mass spectrometry (LC-MS). The output from this protocol are large amounts of data in the form of precursor spectra from analysis of intact peptides, and product spectra from the fragmentation of selected peptides. Analysis of the output data from the different protein quantification experiments is a very important and difficult stage of any experiment.

Different labeling techniques are useful for comparing several samples in one analysis, as the signal from each sample can be distinguished by its label. However, the long procedure of introducing a label into proteins and the costs of labeling reagents makes it difficult. By eliminating any labels, dyes or specific reagents, label free method has been developed with high sensitivity. Label free quantification based on relative precursor ion intensity (precursor intensity) or based on the total number of identified product spectra from the protein (spectral counting) are used for quantification.

There are several algorithms used for analyzing the data from peptide (feature) extraction and they yielded different results but we expected the same results for the same sample. We need to know which software and which parameter settings that are most reliable for covering many peptides with low FDR. In the current project, the output results from three common algorithms MaxQuant , MsInspect and OpenMS were compared. Perl scripts were used to process the data and further evaluation was done by plotting different aspects of the data. MaxQuant detected more peptide features than OpenMS and MsInspect.

In the second sub project, different label free techniques were compared. One would expect a reliable method to show little variation between technical replicates. The technical variation in the precursor-based analysis was higher than that achieved using spectral counting datasets. It may relate to low abundance proteins as they make higher variation. The results of the current project may be used to select the most appropriate analysis method for protein quantification.

Advisors: Fredrik Levander, Marianne Sandin
Master’s Degree Project in Bioinformatics, 30 credits
Department of Immunotechnology , Lund University (Less)
Please use this url to cite or link to this publication:
author
Eskafi Noughani, Arian
supervisor
organization
course
BINP30 20112
year
type
H2 - Master's Degree (Two Years)
subject
language
English
id
3732422
date added to LUP
2013-04-30 11:49:54
date last changed
2013-04-30 11:49:54
@misc{3732422,
  abstract     = {Abstract

Over the past decade, mass spectrometry has had an essential role in improvement of proteomics techniques. New instrument and analytical protocol developments have made mass spectrometry one of the most widely used analytical methods in quantitative proteomics. Today one of the commonly used protocols in proteomics is digestion of proteins to peptides and analysis of the peptides with liquid chromatography and mass spectrometry (LC-MS), with subsequent identification of the peptides by tandem mass spectrometry (MS/MS). Quantification can be done on MS1 level by summing intensities of detected peaks (precursor intensity) or using counting of the MS/MS spectra (spectral counting). However, improvement in methods increases the amount of data and the data analysis of results from biological experiments involves several difficulties.
Differentiations between peptide quantities can be found by comparing many samples from LC-MS. The comparison needs precise detection of peptides features in LC-MS spectra to allow matching between files. Mass changes in measurements and fluctuations in retention times (RT) make the matching process difficult. In this project we are trying to enable accurate peptide quantification based on LC-MS data, by first comparing and characterizing the pre-requirements for software used in the analysis process.
 There are several algorithms used for peptide (feature) extraction from LC-MS data files. The different algorithms yielded different results and one of the challenges start from this point because we need to know which software and which parameter settings that are most reliable for covering many peptides with low false discovery rate. In the current project, MaxQuant is one of the common algorithms in label free quantitative proteomics detected more peptide features than OpenMS and MsInspect. 
Technical variation in different experiments has an important role in the results of the experiment, and can be used for assessing the performance of different quantification methods. In the second part of the project, technical variation was compared for analysis based on precursor intensity, spectral counting and total MS/MS intensity for two different datasets. The technical variation in the precursor-based analysis was higher than that achieved using spectral counting or total MS/MS intensity datasets. The results of the current project may be used to select the most appropriate analysis method for protein quantification.

Popular science summary:

Limitation of algorithms for quantitative label-free proteomics

Although whole genome sequencing for many organisms is completed, many genes remain without assigned functions. To understand the function of each gene, new techniques for high throughput data generation have been envisaged. Proteomics is the study of the proteome, which is the collection of different proteins and their levels in a given cell or tissue. By comparing proteomes between samples one can get an idea about the role of different proteins in disease or biological processes.

Proteomics can in this way help doctors and researchers to better diagnose and treat diseases. An aim is to find biological markers for disease to enable diagnosis and prognosis for personalized medicine. However, global protein analysis is difficult to perform, and involves complex protocols. One of the commonly used protocols in proteomics is digestion of proteins to peptides and analysis of peptides with liquid chromatography and mass spectrometry (LC-MS). The output from this protocol are large amounts of data in the form of precursor spectra from analysis of intact peptides, and product spectra from the fragmentation of selected peptides. Analysis of the output data from the different protein quantification experiments is a very important and difficult stage of any experiment. 

Different labeling techniques are useful for comparing several samples in one analysis, as the signal from each sample can be distinguished by its label. However, the long procedure of introducing a label into proteins and the costs of labeling reagents makes it difficult. By eliminating any labels, dyes or specific reagents, label free method has been developed with high sensitivity. Label free quantification based on relative precursor ion intensity (precursor intensity) or based on the total number of identified product spectra from the protein (spectral counting) are used for quantification.

There are several algorithms used for analyzing the data from peptide (feature) extraction and they yielded different results but we expected the same results for the same sample. We need to know which software and which parameter settings that are most reliable for covering many peptides with low FDR. In the current project, the output results from three common algorithms MaxQuant , MsInspect and OpenMS were compared. Perl scripts were used to process the data and further evaluation was done by plotting different aspects of the data. MaxQuant detected more peptide features than OpenMS and MsInspect.

In the second sub project, different label free techniques were compared. One would expect a reliable method to show little variation between technical replicates. The technical variation in the precursor-based analysis was higher than that achieved using spectral counting datasets. It may relate to low abundance proteins as they make higher variation. The results of the current project may be used to select the most appropriate analysis method for protein quantification.

Advisors: Fredrik Levander, Marianne Sandin
Master’s Degree Project in Bioinformatics, 30 credits
Department of Immunotechnology , Lund University},
  author       = {Eskafi Noughani, Arian},
  language     = {eng},
  note         = {Student Paper},
  title        = {Limitation of algorithms for quantitative label-free LC-MS-based proteomics},
  year         = {2012},
}