A Software Program for the Rapid Sequence Analysis of Unknown Peptides Involving Modifications, Based on MS/MS Data

 

Jorge Fernandez-de-Cossio, Luis Javier Gonzarez, Toshifumi Takao , Yasutsugu Shimonishi, Gabriel Padron, & Vladimir Besada

Center for Genetic Engineering and Biotechnology, P.O. Box 6162, Havana, Cuba

Institute for Protein Research, Osaka University, Yamadaoka 3-2, Suita, Osaka 565, Japan

 

In this work, we present "SeqMS", a software intended to probe peptide sequence based on a product ion spectrum produced upon high-energy collision induced dissociation. This program was developed for Windows'95 with an optimized algorithm, improved scoring methods, automatic calculation of gaps between signals and the detection of modified amino acids.

The spectrum is transformed to a graph, where the vertices represent ions in view of m/z and the probabilities for every ion series. Vertices are paired by arcs, when the differences between them match with the residual masses of any amino acids. The consecutive arcs from one end to the other of the graph constitute the sequence of a peptide. The score of each sequence results from the addition of the probabilities assigned to the consecutive arcs. Previously, we devised a criterion to recognize at the early stage of calculation those sequences that will not be expected in a final result [1]. This avoids the infructuous calculation of a huge number of sequences, which leads to exponential increase of the execution time. To illustrate the effectiveness of this criterion, we examined the spectra of 60 kinds of peptides ranging from m/z 300 to 1700. Fig. 1A shows, in a logarithmic scale, the number of possible sequences and the sequences analyzed based on the criterion as a function of m/z. The reduction of the number of analyzed sequences is more drastic as an increase of m/z. For example, the number of possible sequences for the peptide with m/z= 1687 is 7.5 E+16, however, the number of sequences analyzed by using the criterion was 2406. Furthermore, in the present program, the arcs are sorted as a function of the pre-calculated forthcoming score.

Fig. 1

This novel procedure speeds up the processing by discarding those sequences derived from the rest of branches of the current vertex when the criterion is not satisfied. As shown in Fig. 1B, the number of analyzed sequences was further reduced after sorting the arcs. In case of the above peptide, the number of analyzed sequences was reduced from 2406 to 80. The optimization of the calculation method achieved in SeqMS made possible the introduction of additional complicated tasks such as findings of gaps between ion series and the analysis of known or unknown modifications.

We have introduced the following new features into the scoring procedure as well as a set of probabilities of each ion series: 1) a set of probabilities for every ion series, including those arising from side-chain fragmentation. Sets of probabilities, different from the default, can be created and optimized; 2) sequences constructed with repeated ion series in the spectrum are favored on the final score; 3) the scores of candidate sequences are normalized according to the residual masses of amino acids; 4) the presence of signals characteristic for the specific amino acids such as immonium ions, adds some bonus to the final score. SeqMS also permits the retrieval of raw data obtained in MS and reconstruction of the spectra on a comfortable graphical user interface with the aid of several handy tools for processing the spectra, setting multiple threshold levels and automatic peak detection. SeqMS can generate the candidate sequences based on the detected peaks, and display the resulting assignments for each candidate in a spectrum or in tabular form. Histograms, which represent hits/position or score/position at each amino acid residue in candidate sequences, are available, which are useful for statistical evaluation of each candidate sequence. In addition to the above features, SeqMS can deal with B/E-linked scan or tandem mass spectra obtained for a partially 18O-labeled peptide [2,3]. In the latter case, one product ion spectrum is obtained from MH+ ion and another from (M+3)+, which corresponds to the 18O-incorporated molecular ion species. This makes the assignments of product ions more reliable by taking into account the incorporation of 18O atom into the C-terminal ions. SeqMS was tested for 90 spectra of proteolytic and synthetic peptides. The results of calculation are shown in Fig. 2. The expected values for appearance of the real sequences within the top three or ten of the candidates were 60 and 76 % using only the probability set of ion series (left in Fig. 2), respectively, and increased to 79 and 91 % using both the probability set and the above scoring method (right in Fig. 2). Furthermore, the values for appearance of the real sequences within the top five, obtained for some of these peptides, reached to 97 % when 18O-labeling was used. The present software will be a powerful tool, especially, for sequencing of unknown peptides by MS.

 

  1. Fernandez-de-Cossio, J., Gonzalez, J., & Besada, V. Comp. Applied Biosci., 11, 427-434 (1995).
  2. Takao, T., Hori, H., Okamoto, K., Harada, A., Kamachi, M., & Shimonishi, Y. Rapid Commun. Mass Spectrom., 5, 312-315 (1991).
  3. Takao, T., Gonzalez, J., Yoshidome, K., Sato, K., Asada, T., Kammei, Y., & Shimonishi, Y. Anal. Chem., 65, 2394-2399 (1993).

 


back