JP2008145221A

JP2008145221A - Method, apparatus, and program for analyzing amino acid sequence using mass spectrometry and recording medium recording this program

Info

Publication number: JP2008145221A
Application number: JP2006331621A
Authority: JP
Inventors: Shigeki Kajiwara; 茂樹梶原; Naoichi Yamaki; 直一八巻; Kazutoshi Ando; 和敏安藤; Kazuyuki Sekiya; 和之関谷
Original assignee: Shizuoka University NUC; Shimadzu Corp
Current assignee: Shizuoka University NUC; Shimadzu Corp
Priority date: 2006-12-08
Filing date: 2006-12-08
Publication date: 2008-06-26
Anticipated expiration: 2026-12-08
Also published as: JP4841414B2

Abstract

<P>PROBLEM TO BE SOLVED: To improve reliability in the estimation of amino acid sequences by de novo sequence. <P>SOLUTION: The problem of detecting amino acid sequence candidates which maximize scores indicating their reliability when amino acid sequence candidates are to be selected on the basis mass spectral data is formulated as a longest path problem in a two-dimensional nonrecursive graph having an axis in one direction indicating positions on an amino acid sequence and the other axis in the other direction indicating the mass of mass spectra. Scores to which peak intensity is added are determined by searching for paths on the basis of a peak list in which the mass and intensity of peaks derived from peptides to be tested are collected to specify each amino acid as selecting paths having large scores and tracing them backward and determine amino acid sequences. It is possible to perform arithmetic processing at high speed by this method and acquire a large number of candidates without leaving correct amino acid sequences. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、ペプチド混合物を含む目的試料を質量分析し、これにより得られたマススペクトルデータを用いて各ペプチドのアミノ酸配列を推定するためのアミノ酸配列解析方法、アミノ酸配列解析装置、アミノ酸配列解析用プログラム、及びアミノ酸配列解析用プログラムを記録した記録媒体に関する。 The present invention relates to an amino acid sequence analysis method, an amino acid sequence analysis apparatus, and an amino acid sequence analysis method for mass spectrometry of a target sample containing a peptide mixture and estimating the amino acid sequence of each peptide using mass spectrum data obtained thereby. The present invention relates to a recording medium on which a program and an amino acid sequence analysis program are recorded.

近年、ポストゲノム研究としてタンパク質の構造や機能の解析が急速に進められている。このようなタンパク質の構造・機能解析手法（プロテオーム解析）の一つとして、質量分析装置を用いたタンパク質の発現解析や一次構造解析が広く行われるようになってきており、四重極型イオントラップや衝突誘起分解（ＣＩＤ）などによって特定のピークの捕捉と開裂を行う、いわゆるＭＳ^ｎ分析（ｎは２以上の整数）が威力を発揮している。一般にＭＳ²（＝ＭＳ／ＭＳ）分析では、まず、分析対象物から特定の質量電荷比（質量ｍ／電荷ｚ）を有するイオンをプリカーサイオンとして選別し、該プリカーサイオンをＣＩＤによって開裂させる。その後、開裂によって生成したイオン（プロダクトイオン）を質量分析することによって、目的とするイオンの質量や化学構造についての情報を得ることができる。 In recent years, protein structures and functions have been rapidly analyzed as post-genomic research. As one of such protein structure / function analysis methods (proteome analysis), protein expression analysis and primary structure analysis using mass spectrometers are widely performed, and a quadrupole ion trap So-called MS ⁿ analysis (n is an integer of 2 or more), which captures and cleaves a specific peak by, for example, collision induced decomposition (CID), is effective. In general, in MS ² (= MS / MS) analysis, an ion having a specific mass-to-charge ratio (mass m / charge z) is first selected from an analysis object as a precursor ion, and the precursor ion is cleaved by CID. Then, information on the mass and chemical structure of the target ion can be obtained by mass analysis of ions (product ions) generated by cleavage.

上記のようなＭＳ^ｎ分析によってタンパク質のアミノ酸配列を同定する場合には、まず、タンパク質を適当な酵素で消化してペプチド断片の混合物としてから、該ペプチド混合物を質量分析する。このとき、各ペプチドを構成する元素には質量の異なる安定同位体が存在するため、同一のアミノ酸配列から成るペプチドであっても、その同位体組成の違いによって質量電荷比の異なる複数のピークを生じる。該複数のピークは、天然存在比が最大の同位体のみで構成されたイオン（主イオン）のピークと、それ以外の同位体を含むイオン（同位体イオン）のピークから成り、これらはイオンの価数が１の場合には１Da間隔で並んだ複数本のピークから成る同位体ピーク群を形成する。 When the amino acid sequence of a protein is identified by MS ⁿ analysis as described above, first, the protein is digested with an appropriate enzyme to form a mixture of peptide fragments, and then the peptide mixture is subjected to mass spectrometry. At this time, since stable isotopes having different masses exist in the elements constituting each peptide, a plurality of peaks having different mass-to-charge ratios depending on the isotopic composition of peptides having the same amino acid sequence. Arise. The plurality of peaks are composed of an ion (main ion) peak composed only of an isotope having a maximum natural abundance ratio and an ion (isotope ion) peak containing other isotopes, and these are peaks of ions. When the valence is 1, an isotope peak group composed of a plurality of peaks arranged at intervals of 1 Da is formed.

続いて、上記のようなペプチド混合物のマススペクトルデータの中から、単一のペプチドに由来する一組の同位体ピーク群をプリカーサイオンとして選択し、該プリカーサイオンを開裂させて得られたイオン（プロダクトイオン）の質量分析（ＭＳ^２分析）を行う。また、１回の開裂操作では十分に小さな断片に開裂しない場合には、開裂操作を複数回行うことも考えられる。 Subsequently, from a mass spectrum data of the peptide mixture as described above, a set of isotope peaks derived from a single peptide is selected as a precursor ion, and ions obtained by cleaving the precursor ion ( Product ion) mass analysis (MS ² analysis). In addition, when the cleavage operation is not performed into a sufficiently small fragment by one cleavage operation, the cleavage operation may be performed a plurality of times.

以上のようにして得られたプロダクトイオンのマススペクトルパターンや上記プリカーサイオンのマススペクトルパターンを基に、例えばマトリックスサイエンス社が提供しているマスコット（MASCOT）等の検索エンジンを利用してアミノ酸配列同定用データベース検索を実行することにより、被検ペプチドのアミノ酸配列を決定することができる。しかしながら、データベースに登録されていない新規なタンパク質の場合には上記方法を利用できないため、デノボ（De Novo）・シーケンスと呼ばれる方法でマススペクトルからアミノ酸配列を推定する方法が採られている。簡単に言うと、デノボ・シーケンスは、マススペクトルに現れる複数のピークの間の質量差に一致する質量のアミノ酸を探索することで被検ペプチドのアミノ酸配列を推定する方法である。このための探索のアルゴリズムは従来より各所で検討されており、グラフ理論を利用した方法、動的計画法を利用した方法などが開発・提案されている。そうした方法の１つとして非特許文献１に記載の動的計画法に基づくアルゴリズムがある。 Based on the mass spectrum pattern of the product ion obtained as described above and the mass spectrum pattern of the precursor ion, amino acid sequence identification is performed using a search engine such as MASCOT provided by Matrix Science. By executing the database search for the peptide, the amino acid sequence of the test peptide can be determined. However, since the above method cannot be used in the case of a novel protein not registered in the database, a method of deducing an amino acid sequence from a mass spectrum by a method called a De Novo sequence has been adopted. Briefly, the de novo sequence is a method for estimating the amino acid sequence of a test peptide by searching for amino acids having a mass that matches a mass difference between a plurality of peaks appearing in a mass spectrum. Search algorithms for this purpose have been studied in various places, and methods using graph theory and methods using dynamic programming have been developed and proposed. One of such methods is an algorithm based on dynamic programming described in Non-Patent Document 1.

この非特許文献１に記載のアルゴリズムのポイントは、チャミー・ペア（Chummy Pair）と名付けられた、特異的なＮ末端側アミノ酸配列ＡとＣ末端側アミノ酸配列Ａ’とによるサンドウィッチ（Sandwich）アルゴリズムである。ここでは、同定対象である未知のペプチドＰのアミノ酸配列は、チャミー・ペアを利用しＡ−ａ−Ａ’とサンドウィッチ形式で表される。いま、Ｎ末端側アミノ酸配列Ａの質量をｘ、アミノ酸ａの質量を‖ａ‖ 、Ｃ末端側アミノ酸配列Ａ’の質量をｙ、エラー境界をδと表すと、アミノ酸配列の推定は｜ｘ＋ｙ＋‖ａ‖−Ｍ｜≦δの関係を満たすペプチドを見つけることに帰着される。但し、質量Ｍは、正しいアミノ酸質量の総和＋Ｎ末端質量（Ｎterm＝Ｈ＝1.00782Da）＋Ｃ末端質量（Ｃterm＝ＯＨ＋Ｈ＋Ｈ＝19.0184Da）である。こうして複数個のアミノ酸配列の候補が複数挙げられるから、それらは所定のスコアリング手法により順序付けされる。 The point of the algorithm described in this Non-Patent Document 1 is a sandwich algorithm named by a specific N-terminal side amino acid sequence A and C-terminal side amino acid sequence A ′ named “Chummy Pair”. is there. Here, the amino acid sequence of the unknown peptide P to be identified is expressed in sandwich format with Aa-A ′ using a chamy pair. If the mass of the N-terminal amino acid sequence A is x, the mass of the amino acid a is ‖a‖, the mass of the C-terminal amino acid sequence A ′ is y, and the error boundary is δ, the estimation of the amino acid sequence is | x + y + ‖ This results in finding a peptide that satisfies the relationship a‖−M | ≦ δ. However, the mass M is the sum of correct amino acid mass + N-terminal mass (Nterm = H = 1.00782 Da) + C-terminal mass (Cterm = OH + H + H = 19.0184 Da). Since a plurality of amino acid sequence candidates are listed in this way, they are ordered by a predetermined scoring method.

スコアリング手法としては例えば非特許文献２に記載の方法を用いることができる。このスコアリング手法では次の(1)式のスコアリング関数が用いられている。
ｆ（ｈ_１／ｈ）×ｆ（ｈ_２／ｈ）×ｆ（ｈ_３／ｈ）×exp｛−［（ｍ’−ｍ）／Δ］^２｝×logｈ …(1)
ここで、ｈはｂ／ｙイオンの強度、ｈ_１、ｈ_２はサポーティング・イオン（supporting ions＝neutral lossと副シリーズ）の強度、ｍ’は測定質量、ｍは理論質量、Δは測定質量ｍ’の許容誤差（tolerance）である。即ち、これは、ｂ／ｙイオンが存在すればサポーティング・イオンに応じてその強度にボーナス点を付与する方法であると理解できる。なお、ボーナス点を与えるための関数ｆは経験的に与えられている。 As the scoring method, for example, the method described in Non-Patent Document 2 can be used. In this scoring method, the following scoring function (1) is used.
f (h ₁ / h) × f (h ₂ / h) × f (h ₃ / h) × exp {− [(m′−m) / Δ] ² } × logh (1)
Here, h is the intensity of b / y ions, h ₁ and h ₂ are the intensity of supporting ions (supporting ions = neutral loss and subseries), m ′ is a measured mass, m is a theoretical mass, and Δ is a measured mass m. 'Tolerance'. That is, this can be understood as a method of giving a bonus point to the intensity according to the supporting ions if b / y ions are present. Note that the function f for giving bonus points is given empirically.

ビン・マ(Bin Ma)ほか、「アン・エフェクティブ・アルゴリズム・フォー・ザ・ペプチド・デ・ノボ・シーケンシング・フロム・エムエス／エムエス・スペクトラム(An Effective Algorithm for the Peptide De Novo Sequencing from MS/MS Spectrum)」、シンポジウム・コンビナトリアル・パターン・マッチング(Symp. Comb. Pattern Matching)、2003、pp.266-pp.277Bin Ma et al., “An Effective Algorithm for the Peptide De Novo Sequencing from MS / MS Spectrum), Symposium Combinatorial Pattern Matching (Symp. Comb. Pattern Matching), 2003, pp.266-pp.277 ビン・マ(Bin Ma)ほか、「ピークス：ア・パワフル・ソフトウエア・フォー・ペプチド・デ・ノボ・シーケンシング・バイ・エムエス／エムエス（PEAKS: A Powerful Software for Peptide De Novo Sequencing by MS/ MS）」、ラピッド・コミュニケーション・オブ・スペクトロメトリ（Rapid Communication of Mass Spectrometry）、17, 20 (2003), pp.2337-pp.2342Bin Ma et al., “Peaks: A Powerful Software for Peptide De Novo Sequencing by MS / MS ”, Rapid Communication of Mass Spectrometry, 17, 20 (2003), pp. 2337-pp.2342.

しかしながら、本願発明者の検討によれば、上述したような非特許文献１、２に基づく従来のアミノ酸配列推定方法によっても、正しいアミノ酸配列を推定できる確率は必ずしも高くないことが明らかになっている。その理由の１つは、上述のような動的計画法のアルゴリズムでは複数の候補が挙げられるが、その中に正しいアミノ酸配列が含まれない場合があるからである。また他の理由は、動的計画法で得られた複数の候補の中に正しいアミノ酸配列が含まれていたとしても、上述したようなスコアリング法ではこれを必ずしも最上位にランキングできない場合があるからである。 However, according to the study of the present inventor, it has been clarified that the probability that a correct amino acid sequence can be estimated is not necessarily high even by the conventional amino acid sequence estimation methods based on Non-Patent Documents 1 and 2 as described above. . One reason for this is that the dynamic programming algorithm as described above includes a plurality of candidates, but the correct amino acid sequence may not be included therein. Another reason is that even if the correct amino acid sequence is included in a plurality of candidates obtained by dynamic programming, the scoring method as described above may not necessarily rank this at the highest level. Because.

特に、最初のアミノ酸配列候補の選定の段階で正しいアミノ酸配列が落ちてしまうと、その後のスコアリングの精度をいくら向上させても意味がない。その点で、マススペクトルデータに基づいて、正しいアミノ酸配列が含まれるようにその候補を選定するという演算処理は非常に重要である。本発明はこうした点に鑑みて成されたものであり、その目的とするところは、従来の動的計画法では正しい推定ができないようなデータに対しても正しい推定を行うことができ、安定して信頼度の高いデノボ・シーケンスを実行することができるアミノ酸配列解析方法、アミノ酸配列解析装置、アミノ酸配列解析用プログラム、及びアミノ酸配列解析用プログラムを記録した記録媒体を提供することにある。 In particular, if the correct amino acid sequence drops at the stage of selecting the first amino acid sequence candidate, it is meaningless to improve the accuracy of subsequent scoring. In that respect, the arithmetic processing of selecting the candidate so that the correct amino acid sequence is included based on the mass spectrum data is very important. The present invention has been made in view of these points, and the object of the present invention is to be able to perform correct estimation even for data that cannot be correctly estimated by conventional dynamic programming, and is stable. Another object of the present invention is to provide an amino acid sequence analysis method, an amino acid sequence analysis apparatus, an amino acid sequence analysis program, and a recording medium on which an amino acid sequence analysis program is recorded, which can execute a de novo sequence with high reliability.

上記課題を解決するために成された第１発明に係るアミノ酸配列解析方法は、質量分析により得られたマススペクトルデータに基づいて目的試料のアミノ酸配列を推定するためのアミノ酸配列解析方法であって、
a)マススペクトルデータに基づいて目的試料に由来するピークの質量とピーク強度とを集めたピークリストを作成するピークリスト作成ステップと、
b)前記ピークリストに含まれるデータを元に動的計画法によるアルゴリズムを利用したデノボ・シーケンス解析を行って複数のアミノ酸配列候補を選出するアミノ酸配列候補決定ステップと、
c)前記複数のアミノ酸配列候補のそれぞれについて、マススペクトルデータを利用し、そのアミノ酸配列が目的試料のアミノ酸配列に合致する確からしさを示す確度情報を算出する確度算出ステップと、
d)前記確度情報により複数のアミノ酸配列候補を選別して又は序列を決めて提示する提示ステップと、
を有し、前記アミノ酸配列候補決定ステップでは、ピークリストに挙げられているピークの中で順次選択するピークの強度を加算して算出されるスコアを最大化する又はより大きくするようなアミノ酸配列候補の選定を、アミノ酸配列上の結合位置とマススペクトルの質量とをそれぞれ異なる方向の軸とする非巡回的グラフにおける最長路及びより長い有向路を見い出す問題として定式化し、ピークリストを利用して前記非巡回的グラフ上でアミノ酸配列末端で且つ質量が小さい側を出発点として各アミノ酸結合位置毎に有向路を探索するようにしたことを特徴としている。 An amino acid sequence analysis method according to a first invention made to solve the above problems is an amino acid sequence analysis method for estimating an amino acid sequence of a target sample based on mass spectrum data obtained by mass spectrometry. ,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the amino acid sequence candidates that maximize or increase the score calculated by adding the intensities of the peaks that are sequentially selected from the peaks listed in the peak list Is formulated as a problem of finding the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass spectrum mass as axes in different directions, and using the peak list The directed path is searched for each amino acid binding position starting from the end of the amino acid sequence on the acyclic graph and having a smaller mass.

また第２発明に係るアミノ酸配列解析装置は、コンピュータ上で上記第１発明に係るアミノ酸配列解析方法を実現するための装置であって、
a)マススペクトルデータに基づいて目的試料に由来するピークの質量とピーク強度とを集めたピークリストを作成するピークリスト作成手段と、
b)前記ピークリストに含まれるデータを元に動的計画法によるアルゴリズムを利用したデノボ・シーケンス解析を行って複数のアミノ酸配列候補を選出するアミノ酸配列候補決定手段と、
c)前記複数のアミノ酸配列候補のそれぞれについて、マススペクトルデータを利用し、そのアミノ酸配列が目的試料のアミノ酸配列に合致する確からしさを示す確度情報を算出する確度算出手段と、
d)前記確度情報により複数のアミノ酸配列候補を選別して又は序列を決めて提示する情報提示手段と、
を備え、前記アミノ酸配列候補決定手段では、ピークリストに挙げられているピークの中で順次選択するピークの強度を加算して算出されるスコアを最大化する又はより大きくするようなアミノ酸配列候補の選定を、アミノ酸配列上の結合位置とマススペクトルの質量とをそれぞれ異なる方向の軸とする非巡回的グラフにおける最長路及びより長い有向路を見い出す問題として定式化し、ピークリストを利用して前記非巡回的グラフ上でアミノ酸配列末端で且つ質量が小さい側を出発点として各アミノ酸結合位置毎に有向路を探索する処理を実行することを特徴としている。 An amino acid sequence analyzing apparatus according to the second invention is an apparatus for realizing the amino acid sequence analyzing method according to the first invention on a computer,
a) Peak list creation means for creating a peak list that collects the mass and intensity of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination means for selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, accuracy calculation means for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) Information presenting means for selecting a plurality of amino acid sequence candidates according to the accuracy information or determining the order and presenting them;
In the amino acid sequence candidate determination means, the amino acid sequence candidate determination means that maximizes or increases the score calculated by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection is formulated as a problem of finding the longest path and the longer directed path in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions, and the peak list is used to formulate the selection. It is characterized in that a process for searching for a directed path for each amino acid binding position is executed starting from the end of the amino acid sequence on the acyclic graph and having a smaller mass as a starting point.

また第３発明に係るアミノ酸配列解析用プログラムは、コンピュータ上で上記第１発明に係るアミノ酸配列解析方法を実現するためのプログラムであって、
a)マススペクトルデータに基づいて目的試料に由来するピークの質量とピーク強度とを集めたピークリストを作成するピークリスト作成ステップと、
b)前記ピークリストに含まれるデータを元に動的計画法によるアルゴリズムを利用したデノボ・シーケンス解析を行って複数のアミノ酸配列候補を選出するアミノ酸配列候補決定ステップと、
c)前記複数のアミノ酸配列候補のそれぞれについて、マススペクトルデータを利用し、そのアミノ酸配列が目的試料のアミノ酸配列に合致する確からしさを示す確度情報を算出する確度算出ステップと、
d)前記確度情報により複数のアミノ酸配列候補を選別して又は序列を決めて提示する提示ステップと、
をコンピュータに実行させるものであり、前記アミノ酸配列候補決定ステップでは、ピークリストに挙げられているピークの中で順次選択するピークの強度を加算して算出されるスコアを最大化する又はより大きくするようなアミノ酸配列候補の選定を、アミノ酸配列上の結合位置とマススペクトルの質量とをそれぞれ異なる方向の軸とする非巡回的グラフにおける最長路及びより長い有向路を見い出す問題として定式化し、ピークリストを利用して前記非巡回的グラフ上でアミノ酸配列末端で且つ質量が小さい側を出発点として各アミノ酸結合位置毎に有向路を探索するようにしたことを特徴としている。 The amino acid sequence analysis program according to the third invention is a program for realizing the amino acid sequence analysis method according to the first invention on a computer,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the calculated score is maximized or increased by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection of such amino acid sequence candidates is formulated as a problem to find the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions. Using a list, a directed path is searched for each amino acid binding position starting from the end of the amino acid sequence on the acyclic graph and having a smaller mass.

さらにまた第４発明に係るアミノ酸配列解析用プログラムを記録した記録媒体は、上記第３発明に係るアミノ酸配列解析用プログラムを記録したコンピュータ読み取り可能な記録媒体であって、
a)マススペクトルデータに基づいて目的試料に由来するピークの質量とピーク強度とを集めたピークリストを作成するピークリスト作成ステップと、
b)前記ピークリストに含まれるデータを元に動的計画法によるアルゴリズムを利用したデノボ・シーケンス解析を行って複数のアミノ酸配列候補を選出するアミノ酸配列候補決定ステップと、
c)前記複数のアミノ酸配列候補のそれぞれについて、マススペクトルデータを利用し、そのアミノ酸配列が目的試料のアミノ酸配列に合致する確からしさを示す確度情報を算出する確度算出ステップと、
d)前記確度情報により複数のアミノ酸配列候補を選別して又は序列を決めて提示する提示ステップと、
をコンピュータに実行させるものであり、前記アミノ酸配列候補決定ステップでは、ピークリストに挙げられているピークの中で順次選択するピークの強度を加算して算出されるスコアを最大化する又はより大きくするようなアミノ酸配列候補の選定を、アミノ酸配列上の結合位置とマススペクトルの質量とをそれぞれ異なる方向の軸とする非巡回的グラフにおける最長路及びより長い有向路を見い出す問題として定式化し、ピークリストを利用して前記非巡回的グラフ上でアミノ酸配列末端で且つ質量が小さい側を出発点として各アミノ酸結合位置毎に有向路を探索するようにしたことを特徴としている。 Furthermore, a recording medium recording the amino acid sequence analysis program according to the fourth invention is a computer-readable recording medium recording the amino acid sequence analysis program according to the third invention,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the calculated score is maximized or increased by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection of such amino acid sequence candidates is formulated as a problem to find the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions. Using a list, a directed path is searched for each amino acid binding position starting from the end of the amino acid sequence on the acyclic graph and having a smaller mass.

ここでマススペクトルデータとは、通常、目的とする被検ペプチドをプリカーサイオンとして、これを１乃至複数段階に開裂させることで発生したプロダクトイオンを検出するＭＳ^ｎ分析により得られたマススペクトルデータである。 Here, the mass spectrum data is usually mass spectrum data obtained by MS ⁿ analysis for detecting a product ion generated by cleaving a target test peptide as a precursor ion in one or more stages. is there.

第１発明に係るアミノ酸配列解析方法では、アミノ酸配列候補決定ステップにおいて、推定されるアミノ酸配列の確からしさをスコアで以て判断するが、そのスコアを最大化するアミノ酸配列候補を見い出す問題を非巡回的グラフ上の最長路問題として定式化する。その際に、非巡回的グラフとして、一方向の軸がアミノ酸配列上の結合位置、他方向の軸がマススペクトルの質量である２次元的なグラフを考える。一般的に動的計画法により最長路問題を解く際に問題となるのは、有向路の数が多く計算に時間が掛かりすぎて実用的でなくなることにある。これに対し、上述のような２次元的な非巡回的グラフ上で有向路の探索を行えば、アミノ酸配列で結合しているアミノ酸の数はたかだか３０個程度であるため、１つアミノ酸配列のための探索経路中の有向路の数もそれと同程度に収まる。したがって、短い時間で１つアミノ酸配列のための探索経路を見い出すことができる。 In the amino acid sequence analysis method according to the first invention, in the amino acid sequence candidate determination step, the likelihood of the estimated amino acid sequence is determined by a score, but the problem of finding an amino acid sequence candidate that maximizes the score is acyclic. Formulate as a longest path problem on a dynamic graph. At that time, a two-dimensional graph in which the axis in one direction is the binding position on the amino acid sequence and the axis in the other direction is the mass of the mass spectrum is considered as an acyclic graph. In general, the problem in solving the longest path problem by dynamic programming is that the number of directed paths is so large that the calculation takes too much time and is not practical. On the other hand, if a search for a directed path is performed on the two-dimensional acyclic graph as described above, the number of amino acids linked in the amino acid sequence is about 30, so one amino acid sequence. The number of directed roads in the search route for is within the same range. Therefore, a search path for one amino acid sequence can be found in a short time.

その探索に伴って算出されるスコアの精度が高ければ最長路のみを求めればよいが、実際にはこのスコアが最大のものが正解のアミノ酸配列となるとは限らない。そこで、最長路のみならず、２番目、３番目、…、Ｋ番目に長い経路も求め、これに対応したアミノ酸配列を候補として挙げる。なお、ここでＫの値は例えば計算時間と推定精度との兼ね合いで決めればよく、一例としては２０００とすることができる。このように最長路に次ぐ幾つかの経路を効率的に、つまりは実用上十分に短い時間で探索するためには、デビッド・エプステイン(David Eppstein)、「ファインディング・ザ・ｋ・ショーテスト・パスズ(Finding The k Shortest Paths)」、SiAM J. Computing、Vol.28、 No.2、pp.652-673 (1998)、に開示されている方法を利用することができる。 If the accuracy of the score calculated along with the search is high, it is only necessary to obtain the longest path, but actually, the one with the maximum score does not necessarily become a correct amino acid sequence. Therefore, not only the longest path but also the second, third,..., Kth longest paths are obtained, and the corresponding amino acid sequences are listed as candidates. Here, the value of K may be determined in consideration of, for example, the calculation time and the estimation accuracy, and can be 2000 as an example. In order to search for several routes after the longest path in this way efficiently, that is, in a sufficiently short time for practical use, David Eppstein, “Finding the k Shortest Paths (Finding The k Shortest Paths) ", SiAM J. Computing, Vol. 28, No. 2, pp. 652-673 (1998), can be used.

第１乃至第４発明に係るアミノ酸配列解析方法、アミノ酸配列解析装置、アミノ酸配列解析用プログラム、及びアミノ酸配列解析用プログラムを記録した記録媒体によれば、上述のような動的計画法によりより大きなスコアを与えるアミノ酸配列候補の選定を迅速に行うことができる。それにより、実用的な計算時間において多数の候補を挙げることができ、正しいアミノ酸配列が候補から漏れることを回避することができる。その結果、デノボ・シーケンスの信頼性を向上させ、アミノ酸配列の同定精度を従来よりも高めることができる。 According to the amino acid sequence analysis method, the amino acid sequence analysis device, the amino acid sequence analysis program, and the recording medium on which the amino acid sequence analysis program according to the first to fourth inventions is recorded, A candidate amino acid sequence that gives a score can be quickly selected. Thereby, a large number of candidates can be listed in a practical calculation time, and the correct amino acid sequence can be avoided from being leaked from the candidates. As a result, the reliability of the de novo sequence can be improved and the identification accuracy of the amino acid sequence can be increased as compared with the conventional case.

以下、本発明に係るアミノ酸配列解析方法、アミノ酸配列解析装置、アミノ酸配列解析用プログラム、及びアミノ酸配列解析用プログラムを記録した記録媒体について図面を参照して説明する。図１は本実施例のアミノ酸配列解析方法の概略フローチャートである。このアミノ酸配列解析方法は、アミノ酸配列解析用プログラムを記録した、例えばＣＤ−ＲＯＭ（ＣＤ−Ｒ、ＣＤ−ＲＷ）、ＭＯ、ＤＶＤ−ＲＡＭ、メモリカード、ＦＤなどの着脱自在の記録媒体、ＨＤＤなどの一般的に着脱自在ではない記録媒体など、様々な記録媒体をコンピュータで読み取らせ、このプログラムを実行することで達成されるものである。 Hereinafter, an amino acid sequence analyzing method, an amino acid sequence analyzing apparatus, an amino acid sequence analyzing program, and a recording medium recording the amino acid sequence analyzing program according to the present invention will be described with reference to the drawings. FIG. 1 is a schematic flowchart of the amino acid sequence analysis method of this example. In this amino acid sequence analysis method, a program for amino acid sequence analysis is recorded, for example, a removable recording medium such as CD-ROM (CD-R, CD-RW), MO, DVD-RAM, memory card, FD, HDD, etc. This is achieved by causing a computer to read various recording media such as a recording medium that is generally not removable, and executing this program.

まず、例えばＭＡＬＤＩ−イオントラップ型ＴＯＦＭＳなどのＭＳ／ＭＳ質量分析装置により、目的とする被検ペプチドを含む試料の質量分析（ＭＳ^ｎ分析）を行って、検出されたイオンの質量（厳密には質量電荷比）と強度との関係を示すマススペクトルデータを収集する（ステップＳ１）。実質的にコンピュータにより構成されるデータ解析装置には、収集されたマススペクトルデータが入力されることになる。通常、マススペクトルデータに基づいて得られるマススペクトルには図２（ａ）に示すようにノイズを含めて多数のピークが出現する。 First, mass analysis (MS ⁿ analysis) of a sample containing a target test peptide is performed by an MS / MS mass spectrometer such as MALDI-ion trap type TOFMS, and the mass of detected ions (strictly speaking, Mass spectrum data indicating the relationship between the mass-to-charge ratio) and the intensity is collected (step S1). Collected mass spectrum data is input to a data analysis apparatus that is substantially constituted by a computer. Usually, a large number of peaks including noise appear in the mass spectrum obtained based on the mass spectrum data as shown in FIG.

次に、マススペクトルの中で目的とする被検ペプチド由来のピークを選択し、解析対象となるピークの質量と強度とを集めたピークリストを作成する（ステップＳ２）。ここでピークの選択は、例えば、ロビン・グラス(Robin Gras)ほか、「インプルービング・プロテイン・アイデンティフィケイション・フロム・ペプチド・マス・フィンガープリンティング・スルー・ア・パラメタライズド・マルチ−レベル・スコアリング・アルゴリズム・アンド・アン・オプティマイズド・ピーク・デテクション(Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection)」、エレクトロフォレシス(Electrophoresis)、20、 pp.3535-3550 (1999)、に開示されている方法を利用することができる。即ち、同位体ピーククラスタ（同一の元素組成を有するイオンに由来し、イオン中の同位体組成の相違によって異なる質量電荷比を示す複数本のピークから成るピーク群）の強度比を理論値と測定値とで比較することにより、不所望のノイズのピークを除外して解析対象とすべきピークを選択することができる。 Next, the peak derived from the target test peptide is selected from the mass spectrum, and a peak list in which the mass and intensity of the peak to be analyzed are collected is created (step S2). The selection of the peak here is, for example, Robin Gras, et al., `` Improving Protein Identification From Peptide Mass Fingerprinting Through A Parameterized Multi-Level Improving protein identification from peptide mass fingerprinting through a parameterized multi-level scoring algorithm and an optimized peak detection, Electrophoresis, 20, pp.3535 -3550 (1999), can be used. That is, the intensity ratio of an isotope peak cluster (a group of peaks consisting of a plurality of peaks derived from ions having the same elemental composition and showing different mass-to-charge ratios depending on the isotope composition in the ions) is measured with a theoretical value. By comparing with a value, a peak to be analyzed can be selected by removing an unwanted noise peak.

次いで、上述のように作成されたピークリストを用い、動的計画法による所定のアルゴリズムに従ってアミノ酸配列の候補を求める（ステップＳ３）。このアルゴリズムの原理について説明する。ここでは、アミノ酸配列候補にその確からしさを類推する指標としてスコアを与え、そのスコアを最大化する候補を見い出す問題を非巡回的グラフ（有向閉路を含まないグラフ）における最長路問題(longest path problem)として定式化することで、動的計画法による解法を可能とした。即ち、図３に示すように、互いに直交する一方向（図３では上から下へ向かう縦方向）に質量ｍをとり、他の一方向（図３では左から右へ向かう横方向）にアミノ酸配列末端から１個ずつのアミノ酸結合位置（ａ１、ａ２、…、ａ１０）をとった２次元的な非巡回的グラフを考える。 Next, using the peak list created as described above, amino acid sequence candidates are obtained according to a predetermined algorithm based on dynamic programming (step S3). The principle of this algorithm will be described. Here, a score is given to an amino acid sequence candidate as an index for estimating the probability, and the problem of finding a candidate that maximizes the score is a longest path problem (longest path problem) in an acyclic graph (a graph that does not include a directed cycle). It was made possible to solve the problem by dynamic programming. That is, as shown in FIG. 3, the mass m is taken in one direction perpendicular to each other (vertical direction from top to bottom in FIG. 3), and amino acid is taken in the other direction (lateral direction from left to right in FIG. 3). Consider a two-dimensional acyclic graph taking amino acid binding positions (a1, a2,..., A10) one by one from the end of the sequence.

この非巡回的グラフにおいて、次のような手順でアミノ酸配列の候補を見い出す。即ち、図３において左端の縦軸が配列末端であり、この軸上が探索の出発点となる。図２（ａ）に示すようにマススペクトル上で（実際にはピークリストの中で）質量が小さい側から順に、ピークの質量に相当するアミノ酸が存在するかどうかを調べ、存在する場合にはその質量を図３に示すグラフの配列末端の軸上にプロットする。例えば図２（ｂ）においてピークＰ１の質量ｍ１に相当するアミノ酸が存在する場合には配列末端の軸上の質量ｍ１の位置に探索経路の出発点をプロットする。 In this acyclic graph, amino acid sequence candidates are found by the following procedure. In other words, the vertical axis at the left end in FIG. 3 is the sequence end, and this axis is the starting point of the search. As shown in FIG. 2 (a), on the mass spectrum (actually in the peak list), in order from the smaller mass side, it is examined whether or not there is an amino acid corresponding to the mass of the peak. The mass is plotted on the axis at the end of the sequence in the graph shown in FIG. For example, when an amino acid corresponding to the mass m1 of the peak P1 exists in FIG. 2B, the starting point of the search path is plotted at the position of the mass m1 on the axis at the end of the sequence.

次に、マススペクトル上でピークＰ１と隣接する次のピークとの質量の差に相当するアミノ酸が存在するかどうかを調べ、存在する場合には先のアミノ酸の質量にこの２番目のアミノ酸の質量を加算した値を図３に示すグラフのａ１とａ２の間の縦線上に次の点としてプロットする。もし、ピークＰ１と隣接する次のピークとの質量の差に相当するアミノ酸が存在しなければ、更に高質量側に隣接するピークとの質量の差に相当するアミノ酸が存在するかどうかを調べる。例えば図２（ｂ）の例では、ピークＰ１とピークＰ４との質量の差ｍ４に相当するアミノ酸が存在したものとすると、図３のグラフで質量ｍ１＋ｍ４の位置に次の点をプロットする。その際に、選択した各ピークの強度を加算してスコアとするから、ピークＰ４が選択された時点でスコアはｉ１＋ｉ４となる。 Next, it is checked whether or not an amino acid corresponding to the difference in mass between the peak P1 and the next adjacent peak on the mass spectrum exists. If present, the mass of the second amino acid is added to the mass of the previous amino acid. Is plotted as the next point on the vertical line between a1 and a2 in the graph shown in FIG. If there is no amino acid corresponding to the difference in mass between the peak P1 and the next adjacent peak, it is further checked whether there is an amino acid corresponding to the difference in mass from the peak adjacent to the higher mass side. For example, in the example of FIG. 2B, assuming that there is an amino acid corresponding to the mass difference m4 between the peak P1 and the peak P4, the next point is plotted at the position of mass m1 + m4 in the graph of FIG. At that time, since the intensity of each selected peak is added to obtain a score, the score is i1 + i4 when the peak P4 is selected.

上記のようにして、図２（ｂ）に示すマススペクトル上では質量の小さなものから順にアミノ酸の質量に適合するピークを調べ、図３に示すグラフ上では配列末端から順にアミノ酸質量の和に従って探索の有向路を延ばしていく。それと共にピーク強度を加算することでスコアを増加させていく。そして、選択したアミノ酸の個数が、（プリカーサイオン質量）／（アミノ酸の最小質量）より多くなれば、それ以上アミノ酸を選択することはできないから経路探索を停止する。なお、１つのアミノ酸配列に対し、前述の(1)式と同様に、サポーティング・イオンを含む複数のピークの強度がスコアに加算されるようにする。 As described above, on the mass spectrum shown in FIG. 2 (b), the peak that matches the mass of the amino acid is examined in order from the smallest mass, and on the graph shown in FIG. Will extend the directed road. At the same time, the score is increased by adding the peak intensity. If the number of selected amino acids exceeds (precursor ion mass) / (minimum amino acid mass), no more amino acids can be selected, and the route search is stopped. For one amino acid sequence, the intensities of a plurality of peaks including supporting ions are added to the score, as in the above-described equation (1).

図３の例では、３つの探索経路Ｒ１、Ｒ２、Ｒ３が見い出されており、探索経路Ｒ１は９個のアミノ酸が選択されて合計のスコアが３９０、探索経路Ｒ３は同じく９個のアミノ酸が選択されて合計のスコアが３６０、探索経路Ｒ２は１０個のアミノ酸が選択されて合計のスコアが４６０である。なお、一般的に、こうした経路探索では、複数経路が同一地点を通るとそれ以前の経路の中で古いもの又はスコアが小さいものが消去されるが、ここでは複数経路の全てが保存される。その場合でも、上述したデビッド・エプステインの文献に記載の方法を利用して探索時間が長くなることを回避することができる。 In the example of FIG. 3, three search routes R1, R2, and R3 are found, and nine amino acids are selected for the search route R1 and the total score is 390, and nine amino acids are also selected for the search route R3. Thus, the total score is 360, and 10 amino acids are selected for the search route R2, and the total score is 460. In general, in such a route search, when a plurality of routes pass through the same point, the oldest route or the one with a small score is deleted, but here, all of the plurality of routes are stored. Even in such a case, it is possible to avoid an increase in the search time by using the method described in the above-mentioned document of David Epstein.

図３の例では３本の探索経路Ｒ１、Ｒ２、Ｒ３のみが記載されているが、実際には、多くの場合、かなり多数の探索経路が見出されることになり、その中でスコアの高い幾つかの経路に対し、図４に例示するように終点から逆向きに経路を辿ることで各配列位置に対応するアミノ酸を特定していく。図４では、ＬＶＶＹＰＷＴＱＲいうアミノ酸配列が特定されている。こうして複数の探索経路についてアミノ酸配列を特定することで、アミノ酸配列候補を決定する。 In the example of FIG. 3, only three search paths R1, R2, and R3 are described. However, in practice, a large number of search paths are often found. For these routes, the amino acid corresponding to each sequence position is specified by tracing the route in the reverse direction from the end point as illustrated in FIG. In FIG. 4, an amino acid sequence called LVVYPWTQR is specified. Thus, amino acid sequence candidates are determined by specifying amino acid sequences for a plurality of search paths.

次に、上記のようにして得られた複数のアミノ酸配列候補のそれぞれについて、精密なスコアを算出する（ステップＳ４）。即ち、上記ステップＳ３でもスコアを算出しているが、それはあくまでもアミノ酸配列候補を絞るための目安としての概算値であり、ステップＳ４におけるスコアは候補をランキングするためのより精密な値である。具体的には、基本的にピーク強度の加算によりスコアを求めるのは同じであるが、アミノ酸配列を仮定したときのフラグメントパターンを考慮し、サポーティング・イオンなど複数種のフラグメントイオンに相当するピークの強度はスコアに１回しか加算しないようにする。また、プリカーサイオンの種類は同一であってもフラグメントパターンの態様は質量分析装置の種類などに依存して相違する。そこで、例えば実験的にフラグメントパターンの出方を調べておき、それに基づいてスコアに加算するフラグメントイオンの種類の組み合わせを選択したり加算時の重み付けを変更したりするとよい。さらに非特許文献２に記載のように、スコアにおいて質量誤差を考慮することも考えられる。さらにまた、アミノ酸配列の長さに応じたペナルティ（又はボーナス）を加えるようにしてもよい。これは、アミノ酸配列の長いペプチドでは結合が切れる個所が多いために、アミノ酸配列の短いものに比べてスコアが大きくなる傾向にあるためである。 Next, a precise score is calculated for each of the plurality of amino acid sequence candidates obtained as described above (step S4). That is, although the score is also calculated in step S3, it is an approximate value as a guideline for narrowing down amino acid sequence candidates, and the score in step S4 is a more precise value for ranking candidates. Specifically, it is basically the same to obtain a score by adding peak intensities, but considering the fragment pattern when assuming an amino acid sequence, peaks corresponding to multiple types of fragment ions such as supporting ions are considered. The intensity should be added only once to the score. Moreover, even if the precursor ions are the same, the form of the fragment pattern differs depending on the type of mass spectrometer. Therefore, for example, it is preferable to experimentally examine how the fragment pattern appears and to select a combination of the types of fragment ions to be added to the score based on this, or to change the weighting at the time of addition. Further, as described in Non-Patent Document 2, it is conceivable to consider a mass error in the score. Furthermore, a penalty (or bonus) according to the length of the amino acid sequence may be added. This is because a peptide having a long amino acid sequence tends to have a higher score than a peptide having a short amino acid sequence because there are many portions where bonds are broken.

ステップＳ３で選定したアミノ酸配列候補についてのみステップＳ４で精密なスコア計算を行えばよいので、複雑な計算を行っても、実用上十分に短い時間で各候補の精密なスコアを算出することができる。 Since it is only necessary to perform accurate score calculation in step S4 only for the amino acid sequence candidates selected in step S3, it is possible to calculate the accurate score of each candidate in a sufficiently short time even if complex calculation is performed. .

そして最後に、上述のように算出された精密なスコアに従い、全て又は信頼度の高いとみなせるアミノ酸配列候補をランキング付けして表示する（ステップＳ５）。このときには、マススペクトルと推定したアミノ酸配列の対応も併せて表示するとよい。 Finally, according to the precise score calculated as described above, amino acid sequence candidates that can be regarded as all or highly reliable are ranked and displayed (step S5). At this time, the correspondence between the mass spectrum and the estimated amino acid sequence may be displayed together.

上記のような手順で実行したアミノ酸配列解析結果の一例を図５に示す。ここでは、測定されたマススペクトルを上段に示し、その下の左端列にはランク１−２０位のアミノ酸配列を、右端列にそれらアミノ酸配列のスコアを示している。また図５中に記した○＋×印の種類はフラグメントイオン種類を表わし、大きさはそのイオン質量に相当するピーク強度の大きさを示している。本例では、ランク１位で推定されたアミノ酸配列（ＥＦＴＰＶＬＱＡＤＦＱＫ）が実際に測定したヘモグロビン（Hemoglobin）のアミノ酸配列と一致することが確認できた。 An example of the amino acid sequence analysis result executed by the above procedure is shown in FIG. Here, the measured mass spectrum is shown in the upper row, the amino acid sequence at rank 1-20 is shown in the lower left column below, and the scores of those amino acid sequences are shown in the rightmost column. In addition, the types indicated by ○ + × in FIG. 5 indicate the types of fragment ions, and the size indicates the magnitude of the peak intensity corresponding to the ion mass. In this example, it was confirmed that the amino acid sequence estimated at rank 1 (EFTPVLQADFQK) matches the actually measured amino acid sequence of hemoglobin.

なお、上記実施例は本発明の一例にすぎず、本発明の趣旨の範囲で適宜変形、修正、追加等を行っても本願特許請求の範囲に包含されることは当然である。即ち、本発明の特徴は図１中のステップＳ３におけるアルゴリズムや演算処理にあるから、それ以外の各ステップにおける演算処理は上記説明のものに限らず、従来から知られている各種の方法を利用することができる。 It should be noted that the above embodiment is merely an example of the present invention, and it will be understood that the present invention is encompassed in the scope of the claims of the present application even if appropriate modifications, corrections, additions, etc. are made within the scope of the present invention. That is, since the feature of the present invention is the algorithm and arithmetic processing in step S3 in FIG. 1, the arithmetic processing in each of the other steps is not limited to that described above, and various conventionally known methods are used. can do.

本発明の一実施例であるアミノ酸配列解析方法の概略フローチャート。The schematic flowchart of the amino acid sequence analysis method which is one Example of this invention. 本実施例のアミノ酸配列解析方法における動的計画法によるアミノ酸配列推定処理の説明図。Explanatory drawing of the amino acid sequence estimation process by the dynamic programming in the amino acid sequence analysis method of a present Example. 本実施例のアミノ酸配列解析方法における動的計画法によるアミノ酸配列推定処理の説明図。Explanatory drawing of the amino acid sequence estimation process by the dynamic programming in the amino acid sequence analysis method of a present Example. 本実施例のアミノ酸配列解析方法における動的計画法によるアミノ酸配列推定処理の説明図。Explanatory drawing of the amino acid sequence estimation process by the dynamic programming in the amino acid sequence analysis method of a present Example. 本実施例のアミノ酸配列解析方法に従って得られたアミノ酸配列解析結果の一例を示す図。The figure which shows an example of the amino acid sequence analysis result obtained according to the amino acid sequence analysis method of a present Example.

Claims

An amino acid sequence analysis method for estimating an amino acid sequence of a target sample based on mass spectral data obtained by mass spectrometry,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the amino acid sequence candidates that maximize or increase the score calculated by adding the intensities of the peaks that are sequentially selected from the peaks listed in the peak list Is formulated as a problem of finding the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass spectrum mass as axes in different directions, and using the peak list An amino acid sequence analysis method using mass spectrometry, characterized in that a directed path is searched for each amino acid binding position starting from an amino acid sequence end and a side having a small mass on the acyclic graph. .

An amino acid sequence analyzer for estimating an amino acid sequence of a target sample based on mass spectrum data obtained by mass spectrometry,
a) Peak list creation means for creating a peak list that collects the mass and intensity of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination means for selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, accuracy calculation means for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) Information presenting means for selecting a plurality of amino acid sequence candidates according to the accuracy information or determining the order and presenting them;
In the amino acid sequence candidate determination means, the amino acid sequence candidate determination means that maximizes or increases the score calculated by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection is formulated as a problem of finding the longest path and the longer directed path in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions, and the peak list is used to formulate the selection. An amino acid sequence analyzer using mass spectrometry, which performs a process for searching for a directed path for each amino acid binding position, starting from the end of the amino acid sequence on the acyclic graph and having a smaller mass as a starting point .

A program for estimating the amino acid sequence of a target sample using a computer based on mass spectral data obtained by mass spectrometry,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the calculated score is maximized or increased by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection of such amino acid sequence candidates is formulated as a problem to find the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions. Using mass analysis, a directed path is searched for each amino acid binding position starting from the end of the amino acid sequence on the acyclic graph and using the side where the mass is small on the acyclic graph. A program for analyzing amino acid sequences.

A recording medium recording a program for estimating the amino acid sequence of a target sample using a computer based on mass spectral data obtained by mass spectrometry,
a) a peak list creation step for creating a peak list that collects masses and peak intensities of peaks derived from the target sample based on the mass spectrum data;
b) Amino acid sequence candidate determination step of selecting a plurality of amino acid sequence candidates by performing de novo sequence analysis using an algorithm based on dynamic programming based on the data included in the peak list;
c) For each of the plurality of amino acid sequence candidates, using mass spectrum data, an accuracy calculation step for calculating accuracy information indicating the probability that the amino acid sequence matches the amino acid sequence of the target sample;
d) a presenting step of selecting and presenting a plurality of amino acid sequence candidates according to the accuracy information;
In the amino acid sequence candidate determination step, the calculated score is maximized or increased by adding the intensities of the peaks sequentially selected from the peaks listed in the peak list. The selection of such amino acid sequence candidates is formulated as a problem to find the longest and longer directional paths in the acyclic graph with the binding position on the amino acid sequence and the mass of the mass spectrum as axes in different directions. Using mass analysis, a directed path is searched for each amino acid binding position starting from the end of the amino acid sequence on the acyclic graph and using the side where the mass is small on the acyclic graph. The computer-readable recording medium which recorded the amino acid sequence analysis program which recorded.