Repository: Freie Universität Berlin, Math Department

In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics

Audain, Enrique and Uszkoreit, Julian and Sachsenberg, Timo and Pfeuffer, Julianus and Liang, Xiao and Hermjakob, Henning and Sanchez, Aniel and Eisenacher, Martin and Reinert, Knut and Tabb, David L. and Kohlbacher, Oliver and Perez-Riverol, Yasset (2017) In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. Journal of Proteomics, 150 . pp. 170-182. ISSN 18743919

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1016/j.jprot.2016.08.002

Abstract

In mass spectrometry-based shotgun proteomics, protein identifications are usually the desired result. However, most of the analytical methods are based on the identification of reliable peptides and not the direct identification of intact proteins. Thus, assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is a critical step in proteomics research. Currently, different protein inference algorithms and tools are available for the proteomics community. Here, we evaluated five software tools for protein inference (PIA, ProteinProphet, Fido, ProteinLP, MSBayesPro) using three popular database search engines: Mascot, X!Tandem, and MS-GF +. All the algorithms were evaluated using a highly customizable KNIME workflow using four different public datasets with varying complexities (different sample preparation, species and analytical instruments). We defined a set of quality control metrics to evaluate the performance of each combination of search engines, protein inference algorithm, and parameters on each dataset. We show that the results for complex samples vary not only regarding the actual numbers of reported protein groups but also concerning the actual composition of groups. Furthermore, the robustness of reported proteins when using databases of differing complexities is strongly dependant on the applied inference algorithm. Finally, merging the identifications of multiple search engines does not necessarily increase the number of reported proteins, but does increase the number of peptides per protein and thus can generally be recommended.

Item Type:Article
Subjects:Mathematical and Computer Sciences > Computer Science
Divisions:Department of Mathematics and Computer Science > Institute of Computer Science > Algorithmic Bioinformatics Group
ID Code:1939
Deposited By: Anja Kasseckert
Deposited On:31 Aug 2016 09:06
Last Modified:12 Jan 2017 10:08

Repository Staff Only: item control page