CN100390537C - A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry - Google Patents
A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry Download PDFInfo
- Publication number
- CN100390537C CN100390537C CNB2004100908060A CN200410090806A CN100390537C CN 100390537 C CN100390537 C CN 100390537C CN B2004100908060 A CNB2004100908060 A CN B2004100908060A CN 200410090806 A CN200410090806 A CN 200410090806A CN 100390537 C CN100390537 C CN 100390537C
- Authority
- CN
- China
- Prior art keywords
- molecular formula
- isotope
- ion
- fragment ion
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 150000002500 ions Chemical class 0.000 title claims abstract description 159
- 239000012634 fragment Substances 0.000 title claims abstract description 127
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004885 tandem mass spectrometry Methods 0.000 title claims description 20
- 230000000155 isotopic effect Effects 0.000 claims abstract description 18
- 238000001819 mass spectrum Methods 0.000 claims abstract description 14
- 239000000126 substance Substances 0.000 claims description 15
- 238000001914 filtration Methods 0.000 claims description 9
- 230000003595 spectral effect Effects 0.000 claims description 7
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 claims 2
- 108090000765 processed proteins & peptides Proteins 0.000 abstract description 19
- 102000004196 processed proteins & peptides Human genes 0.000 abstract description 11
- 229920001184 polypeptide Polymers 0.000 abstract description 7
- 238000001228 spectrum Methods 0.000 abstract description 6
- 108090000623 proteins and genes Proteins 0.000 description 5
- 102000004169 proteins and genes Human genes 0.000 description 5
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 4
- 125000000539 amino acid group Chemical group 0.000 description 3
- 238000004949 mass spectrometry Methods 0.000 description 3
- 229910052757 nitrogen Inorganic materials 0.000 description 3
- 150000001413 amino acids Chemical class 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 229910052760 oxygen Inorganic materials 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 108010026552 Proteome Proteins 0.000 description 1
- 102000004142 Trypsin Human genes 0.000 description 1
- 108090000631 Trypsin Proteins 0.000 description 1
- 125000003275 alpha amino acid group Chemical group 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 238000001360 collision-induced dissociation Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007062 hydrolysis Effects 0.000 description 1
- 238000006460 hydrolysis reaction Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 239000012588 trypsin Substances 0.000 description 1
Landscapes
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
本发明公开了一种用串联质谱中碎片离子的同位素峰预测离子分子式的方法,该方法从串联质谱和从各元素的原子个数待定的通用分子式中分别获取碎片离子的单同位素的质量以及各同位素谱峰相对于单同位素的相对丰度;将分别获取的质量和相对丰度做匹配以获得所述通用分子式中待定的各元素的原子个数的非负整数解,得到碎片离子的分子式。本发明的方法利用串联质谱中碎片离子的同位素谱峰信息,通过串联质谱碎片离子的同位素谱峰的模式计算此碎片离子对应的分子式。本发明的方法可以提供碎片离子准确的分子式信息,可对鉴定多肽序列的数据库搜索方法提供的候选序列进行鉴别;以及为求解多肽序列的de novo方法产生高可靠候选序列提供依据。The invention discloses a method for predicting the molecular formula of an ion by using the isotope peak of the fragment ion in the tandem mass spectrum. The method obtains the monoisotope mass of the fragment ion and the mass of each The relative abundance of the isotopic peak relative to the monoisotope; the mass and relative abundance obtained respectively are matched to obtain the non-negative integer solution of the number of atoms of each element to be determined in the general molecular formula, and the molecular formula of the fragment ion is obtained. The method of the present invention utilizes the isotope spectrum peak information of the fragment ion in the tandem mass spectrum, and calculates the molecular formula corresponding to the fragment ion through the mode of the isotope spectrum peak of the fragment ion in the tandem mass spectrum. The method of the present invention can provide accurate molecular formula information of fragment ions, can identify candidate sequences provided by the database search method for identifying polypeptide sequences, and provide a basis for generating highly reliable candidate sequences by de novo method for solving polypeptide sequences.
Description
技术领域 technical field
本发明涉及一种蛋白质组分析方法,具体地说,涉及一种预测肽序列碎裂后产生的碎片离子的分子式的方法。The invention relates to a proteome analysis method, in particular to a method for predicting the molecular formula of fragment ions generated after peptide sequence fragmentation.
背景技术 Background technique
在目前利用肽指纹质谱及串联质谱技术和数据库搜索及直接解序(de novo)方法鉴定肽序列和蛋白质的研究中,质谱数据的预处理以及鉴定结果的后处理非常重要。In the current research of identifying peptide sequences and proteins using peptide fingerprint mass spectrometry and tandem mass spectrometry technology, database search and direct de novo (de novo) methods, the preprocessing of mass spectrometry data and the postprocessing of identification results are very important.
被鉴定的多肽在质谱仪中被碎裂为碎片离子,这些碎片离子的质量和丰度被质谱仪器测量出来,形成串联质谱。每一个碎片离子以及其同位素离子都在串联质谱中形成对应的谱峰。考虑到碎片离子的同位素峰会给肽或蛋白质的鉴定过程造成混淆,比如某些氨基酸残基之间的质量差约为0.34,1和1.5da,而同一个碎片离子的一价,二价,三价的同位素峰之间的质荷比(m/z)差分别为1、0.5和0.333,这些氨基酸残基质量差值与同位素峰的m/z差值重叠,导致在鉴定过程中需要判断串联质谱中的一个谱峰是某个碎片离子峰还是另一个碎片离子的同位素峰;此外,多个氨基酸质量求和后与某个碎片离子的同位素峰的重叠现象会更多。因此,传统的数据预处理任务之一是识别出一个碎片离子的同位素峰并予以剔除。The identified peptides are fragmented into fragment ions in the mass spectrometer, and the mass and abundance of these fragment ions are measured by the mass spectrometer to form a tandem mass spectrum. Each fragment ion and its isotopic ion form a corresponding peak in the tandem mass spectrometer. Considering that the isotope peaks of fragment ions confuse the identification process of peptides or proteins, for example, the mass difference between some amino acid residues is about 0.34, 1 and 1.5da, while the monovalent, divalent, trivalent The mass-to-charge ratio (m/z) differences between the isotopic peaks of the valence are 1, 0.5, and 0.333, respectively, and these amino acid residue mass differences overlap with the m/z differences of the isotopic peaks, resulting in the need to judge the tandem mass spectrum in the identification process Is one of the peaks in a fragment ion peak or the isotope peak of another fragment ion; in addition, the overlap of the isotope peak of a fragment ion after summing the masses of multiple amino acids will be more. Therefore, one of the traditional data preprocessing tasks is to identify isotopic peaks of a fragment ion and remove them.
然而,事实上,质谱中表现出的碎片离子的同位素峰的分布模式与该碎片离子的原子组成(即分子式)是密切相关的。因此就需要有一种方法能够利用碎片离子的同位素峰来预测该碎片离子的分子式,这样,预测出的碎片离子的分子式一方面可以为肽鉴定的数据库搜索及de novo方法提供更多更准确的信息,另一方面,为鉴定结果进行后处理提供更多的依据。However, in fact, the distribution pattern of the isotopic peaks of the fragment ions shown in the mass spectrum is closely related to the atomic composition (ie molecular formula) of the fragment ions. Therefore, there is a need for a method that can use the isotope peaks of fragment ions to predict the molecular formula of the fragment ion. In this way, the predicted molecular formula of the fragment ion can provide more and more accurate information for the database search and de novo method of peptide identification on the one hand. , on the other hand, provide more basis for the post-processing of identification results.
发明内容 Contents of the invention
本发明的目的在于提供一种利用串联质谱中的碎片离子的同位素峰来预测该碎片离子的分子式的方法。The purpose of the present invention is to provide a method for predicting the molecular formula of the fragment ion by using the isotope peak of the fragment ion in the tandem mass spectrum.
为了实现上述目的,本发明提供一种用串联质谱中碎片离子的同位素峰预测离子分子式的方法,包括:In order to achieve the above object, the present invention provides a method for predicting ion molecular formula with the isotope peak of fragment ion in tandem mass spectrometry, comprising:
步骤1):从串联质谱中获取一碎片离子的单同位素及其至少一个同位素的谱峰,计算所述碎片离子的单同位素的质量、所述碎片离子的单同位素的谱峰和所述碎片离子的至少一个同位素的谱峰之间的相对丰度;Step 1): Obtain a monoisotopic peak of a fragment ion and at least one isotopic peak of the fragment ion from the tandem mass spectrometer, calculate the mass of the monoisotope of the fragment ion, the monoisotopic spectral peak of the fragment ion, and the fragment ion The relative abundance between the spectral peaks of at least one isotope of
步骤2)提供碎片离子的一通用分子式,所述通用分子式中各元素的原子个数待定;Step 2) providing a general molecular formula of fragment ions, the number of atoms of each element in the general molecular formula is to be determined;
步骤3):用所述通用分子式得到碎片离子的理论上的单同位素的质量、碎片离子的单同位素和其至少一个同位素的相对丰度;所述理论上的单同位素的质量、碎片离子的单同位素和其至少一个同位素离子的相对丰度为所述通用分子式中待定的原子个数的函数;Step 3): use the general molecular formula to obtain the theoretical monoisotopic mass of the fragment ion, the monoisotope of the fragment ion and the relative abundance of at least one isotope thereof; The relative abundance of an isotope and at least one isotopic ion thereof is a function of the number of atoms to be determined in said general formula;
步骤4):将步骤3)中得到的质量和相对丰度与步骤1)中从串联质谱质量和相对丰度做匹配,以获得所述通用分子式中待定的各元素的原子个数的非负整数解,从而得到所述碎片离子的分子式。Step 4): Match the mass and relative abundance obtained in step 3) with the mass and relative abundance of the tandem mass spectrum in step 1), so as to obtain the non-negative number of atoms of each element to be determined in the general molecular formula Integer solution to obtain the molecular formula of the fragment ion.
在上述技术方案中,步骤1)和步骤3)中所述的碎片离子的至少一个同位素包括碎片离子的第一同位素和第二同位素。In the above technical solution, at least one isotope of the fragment ion described in step 1) and step 3) includes the first isotope and the second isotope of the fragment ion.
在上述技术方案中,将步骤1)中得到的所述碎片离子的单同位素的质量、所述碎片离子的单同位素的谱峰和所述碎片离子的至少一个同位素的谱峰之间的相对丰度构成一实验的同位素分布向量;将步骤3)中得到的碎片离子的理论上的单同位素的质量、碎片离子的单同位素和其至少一个同位素的相对丰度构成一理论的同位素分布向量;步骤4)中的所述匹配是用所述实验的同位素分布向量与所述的理论的同位素分布向量之间的欧氏距离作为匹配分数。In the above technical scheme, the relative abundance between the mass of the monoisotope of the fragment ion obtained in step 1), the spectrum peak of the monoisotope of the fragment ion and the spectrum peak of at least one isotope of the fragment ion Constitute an experimental isotope distribution vector; The mass of the theoretical monoisotope of the fragment ion obtained in step 3), the monoisotope of the fragment ion and the relative abundance of at least one isotope thereof constitute a theoretical isotope distribution vector; Step 4 The matching in ) uses the Euclidean distance between the experimental isotope distribution vector and the theoretical isotope distribution vector as the matching score.
在上述技术方案中,还包括用使获得的分子式符合化学意义的化学规则约束条件约束所述匹配。In the above technical solution, it is also included to constrain the matching with chemical rules and constraints that make the obtained molecular formula conform to the chemical meaning.
在上述技术方案中,通过所述匹配获得的所述通用分子式中待定的各元素的原子个数的非负整数解包括:通过所述匹配获得所述通用分子式中待定的各元素的原子个数的实数解;在所述实数解的领域内搜索得到所述通用分子式中待定的各元素的原子个数的非负整数解。In the above technical solution, the non-negative integer solution of the number of atoms of each element to be determined in the general molecular formula obtained through the matching includes: obtaining the number of atoms of each element to be determined in the general molecular formula through the matching The real number solution; search in the field of the real number solution to obtain the non-negative integer solution of the number of atoms of each element to be determined in the general molecular formula.
在上述技术方案中,还包括对步骤4)中得到的所述通用分子式中待定的各元素的原子个数的非负整数解进行过滤的步骤。所述过滤包括平均同位素分布模式方法,该方法用碎片离子的理论上的单同位素的质量、碎片离子的单同位素和其至少一个同位素的相对丰度之间的统计关系过滤所述非负整数解。所述过滤包括用使获得的分子式符合化学意义的化学规则约束条件过滤所述非负整数解。所述过滤包括用两个碎片离子的非负整数解进行交叉验证以过滤所述两个碎片离子的非负整数解。In the above technical solution, it also includes the step of filtering the non-negative integer solutions of the atomic number of each element to be determined in the general molecular formula obtained in step 4). The filtering includes an average isotope distribution pattern method that filters the non-negative integer solution using a theoretical monoisotopic mass of the fragment ion, a statistical relationship between the monoisotope of the fragment ion and the relative abundance of at least one isotope thereof . The filtering includes filtering the non-negative integer solutions with chemical rule constraints that make the obtained molecular formula conform to chemical meaning. The filtering includes cross-validating with the non-negative integer solutions of the two fragment ions to filter the non-negative integer solutions of the two fragment ions.
本发明的优点在于:The advantages of the present invention are:
1)本方法是对串联质谱中碎片离子的同位素谱峰信息的充分利用;1) This method is the full utilization of the isotopic spectrum peak information of fragment ions in the tandem mass spectrometry;
2)本方法能通过串联质谱碎片离子的同位素谱峰的模式,快速准确地计算此碎片离子对应的分子式(准确程度与质谱的精度相关,精度越高,计算出的分子式越可靠);2) This method can quickly and accurately calculate the molecular formula corresponding to the fragment ion through the isotope spectrum peak mode of the tandem mass spectrometry fragment ion (the degree of accuracy is related to the accuracy of the mass spectrum, the higher the accuracy, the more reliable the calculated molecular formula);
3)本方法可以提供碎片离子的准确的分子式信息,可对鉴定多肽序列的数据库搜索方法提供的候选序列进行鉴别;3) The method can provide accurate molecular formula information of fragment ions, and can identify candidate sequences provided by the database search method for identifying polypeptide sequences;
4)本方法计算出的离子分子式可以指导求解多肽序列的de novo方法产生高可靠的候选的序列。4) The ion molecular formula calculated by this method can guide the de novo method for solving the polypeptide sequence to generate highly reliable candidate sequences.
具体实施方式 Detailed ways
下面结合附图和具体实施方式对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
将一个碎片离子的单同位素记为P,此碎片离子的第一同位素记为P1,第二同位素碎片离子记为P2,依此类推,第N同位素离子记为PN。在这里,碎片离子单同位素P是指在该离子的各种组成元素均为单同位素(即质子数和中子数相同)。而碎片离子的同位素是指与单同位素碎片离子具有相同的分子式、但是比单同位素带有更多额外中子的离子,例如碎片离子的第一同位素P1比碎片离子的单同位素P多带有一个额外的中子,第二同位素P2比单同位素P多带有两个额外的中子,依此类推。在本发明中,碎片离子的同位素是在整体上比碎片离子的单同位素带有额外的中子的离子。The monoisotope of a fragment ion is marked as P, the first isotope of this fragment ion is marked as P 1 , the second isotope fragment ion is marked as P 2 , and so on, and the Nth isotope ion is marked as PN . Here, the monoisotopic P of the fragment ion means that the various constituent elements of the ion are monoisotopic (that is, the number of protons and neutrons is the same). The isotope of the fragment ion refers to the ion that has the same molecular formula as the monoisotopic fragment ion, but has more extra neutrons than the monoisotope. For example, the first isotope P of the fragment ion has more than the monoisotope P of the fragment ion. One extra neutron, the second isotope P2 has two more neutrons than the monoisotope P, and so on. In the present invention, an isotope of a fragment ion is an ion that, as a whole, has an extra neutron than a monoisotope of the fragment ion.
肽序列进入质谱仪被离子化,且在质谱仪中,具有特定质荷比(m/z)的肽离子(这些肽离子通常也有相同的氨基酸序列)在碰撞-诱导的分离(Collision-InducedDissociation,CID)作用下裂解为多个碎片离子。这些碎片离子的m/z被检测量出来从而形成串联质谱,在一个串联质谱中,其横坐标表示碎片离子的质荷比(m/z),其纵坐标为检测到的碎片离子的丰度。The peptide sequence enters the mass spectrometer and is ionized, and in the mass spectrometer, peptide ions with a specific mass-to-charge ratio (m/z) (these peptide ions usually also have the same amino acid sequence) undergo collision-induced separation (Collision-InducedDissociation, CID) under the action of fragmentation into multiple fragment ions. The m/z of these fragment ions are detected and measured to form a tandem mass spectrum. In a tandem mass spectrometer, the abscissa represents the mass-to-charge ratio (m/z) of the fragment ions, and the ordinate represents the abundance of the detected fragment ions .
在串联质谱中,挑选出一个碎片离子的单同位素P以及其同位素P1~PN中至少一个对应的谱峰,本发明的目标则是通过这些同位素峰的分布情况来预测碎片离子的单同位素P对应的分子式。在本发明的一个实施例中,从串联质谱中仅挑选出该碎片离子的单同位素P以及其第一同位素P1和第二同位素P2。从后面的描述中本领域的技术人员很容易理解,在本发明的其它实施例中,对于同位素碎片离子,也可以仅挑选出碎片离子的一个同位素的谱峰——例如第一同位素碎片离子P1,或者也可以挑选出更多的同位素的谱峰,不同数目同位素碎片离子的选取都可以实现本发明的方法,但是会影响到本发明实施时计算的复杂度和精度。In the tandem mass spectrometry, the monoisotope P of a fragment ion and at least one corresponding spectral peak among its isotopes P 1 to PN are selected. The object of the present invention is to predict the monoisotope of the fragment ion through the distribution of these isotope peaks. The molecular formula corresponding to P. In one embodiment of the present invention, only the monoisotope P of the fragment ion and its first isotope P 1 and second isotope P 2 are selected from the tandem mass spectrum. Those skilled in the art can easily understand from the following description that in other embodiments of the present invention, for isotopic fragment ions, it is also possible to select only the spectral peak of one isotope of the fragment ion—for example, the first isotopic fragment ion P 1 , or more isotopic spectral peaks can be selected, and the selection of different numbers of isotopic fragment ions can realize the method of the present invention, but it will affect the complexity and accuracy of calculation when the present invention is implemented.
从串联质谱中还可以得到单同位素碎片离子P的离子质量Me,这是本领域的技术人员所熟知的。The ion mass Me of the monoisotopic fragment ion P can also be obtained from the tandem mass spectrometry, which is well known to those skilled in the art.
为了方便于下面的计算,首先定义一实验的同位素分布向量eIPV=(Me,I1,I2),其中,Me为从串联质谱中获得的碎片离子的单同位素P的离子质量,I1和I2分别对应碎片离子的第一同位素P1和第二同位素P2的谱峰相对于单同位素P的谱峰的相对丰度,这些数据均可从串联质谱中获得。In order to facilitate the following calculations, first define an experimental isotope distribution vector eIPV=(M e , I 1 , I 2 ), where Me is the ion mass of the monoisotope P of the fragment ion obtained from the tandem mass spectrometry, I 1 and I2 correspond to the relative abundance of the peaks of the first isotope P1 and the second isotope P2 of the fragment ions, respectively, relative to the peaks of the monoisotope P, both of which can be obtained from tandem mass spectrometry.
然后,再定义一理论的同位素分布向量tIPV=(M,T1,T2),该理论同位素分布向量tIPV可从碎片离子的通用分子式获得。设碎片离子的通用分子式为Cn1Hn2Nn3On4Sn5,其中该分子式中表示各原子组成个数的n1~n5为待定参数。这样,在理论同位素分布向量tIPV中,M为从通用分子式获得的碎片离子的质量,T1和T2分别为从通用分子式获得的第一同位素碎片离子和第二同位素碎片离子关于单同位素碎片离子相对丰度。理论同位素分布向量tIPV可具体可通过公式得到:Then, define a theoretical isotope distribution vector tIPV=(M, T 1 , T 2 ), which can be obtained from the general molecular formula of fragment ions. The general molecular formula of fragment ions is assumed to be C n1 H n2 N n3 O n4 S n5 , wherein n1 to n5 representing the number of each atom in the molecular formula are undetermined parameters. Thus, in the theoretical isotope distribution vector tIPV, M is the mass of the fragment ion obtained from the general formula, and T1 and T2 are respectively the first isotopic fragment ion and the second isotopic fragment ion obtained from the general molecular formula with respect to the monoisotopic fragment ion relative abundance. The theoretical isotope distribution vector tIPV can be specifically obtained by the formula:
M=V×X (1)M=V×X (1)
T1=n1qC+n2qH+n3qN+n4qO1+n5qS1 (2)T 1 =n 1 q C +n 2 q H +n 3 q N +n 4 q O1 +n 5 q S1 (2)
其中V=[12,1,14,16,32],V中的数字为各元素的原子量,X=[n1,n2,n3,n4,n5]T;qC、qH和qN分别是自然界中13C相对于12C、D相对于H、14N相对于15N的相对丰度,q01和q02则分别是自然界中17O相对于16O、18O相对于16O的相对丰度,qs1和qs2是自然界中33S相对于32S、34S相对于32S的相对丰度,这些相对丰度均为已知数值。Wherein V=[12,1,14,16,32], the number in V is the atomic weight of each element, X=[n1, n2, n3, n4, n5] T ; q C , q H and q N are respectively The relative abundance of 13 C relative to 12 C, D relative to H, 14 N relative to 15 N in nature, q 01 and q 02 are the relative abundance of 17 O relative to 16 O, 18 O relative to 16 O in nature, respectively Abundance, q s1 and q s2 are the relative abundances of 33 S relative to 32 S and 34 S relative to 32 S in nature, and these relative abundances are all known values.
可见,对于理论的同位素分布向量tIPV=(M,T1,T2),其中的M、T1和T2均为X=[n1,n2,n3,n4,n5]的函数。It can be seen that for the theoretical isotope distribution vector tIPV=(M, T 1 , T 2 ), M, T 1 and T 2 are all functions of X=[n1, n2, n3, n4, n5].
在本发明中,将理论的同位素分布向量tIPV=(M,T1,T2)与实验同位素分布向量eIPV=(Me,T1,T2)做匹配,以便获得与实验的同位素分布向量最匹配的分子式,也即通用分子式中的原子组成向量X=[n1,n2,n3,n4,n5]的一个非负整数解。In the present invention, the theoretical isotope distribution vector tIPV=(M, T 1 , T 2 ) is matched with the experimental isotope distribution vector eIPV=(M e , T 1 , T 2 ), so as to obtain the experimental isotope distribution vector The most matching molecular formula, that is, a non-negative integer solution of the atom composition vector X=[n1, n2, n3, n4, n5] in the general molecular formula.
在本发明的一个实施例中,用理论的同位素分布向量tIPV和实验的同位素分布向量eIPV之间的欧氏距离E作为tIPV与eIPV的匹配分数:In one embodiment of the present invention, the Euclidean distance E between the theoretical isotope distribution vector tIPV and the experimental isotope distribution vector eIPV is used as the matching score of tIPV and eIPV:
将公式(1)~(3)代入(4),得到Substituting formulas (1)~(3) into (4), we get
δm=n1*12+n2*1+n3*14+n4*16+n5*32-Me1, (5)δ m =n 1 *12+n 2 *1+n 3 *14+n 4 *16+n 5 *32-M e1 , (5)
δ1=n1*qC+n2*qH+n3*qN+n4*qO1+n5*qS1-I1, (6)δ 1 =n 1 *q C +n 2 *q H +n 3 *q N +n 4 *q O1 +n 5 *q S1 -I 1 , (6)
忽略公式(7)中的项,则有[δm δ1 δ2]=AX+B,得到Ignoring the formula (7) item, then [δ m δ 1 δ 2 ]=AX+B, we get
则有:Then there are:
这里在公式(9)中,X=[n1,n2,n3,n4,n5]T是待定的碎片离子的原子组成向量,A和B是由已知量构成的常数矩阵,这里已知量包括从串联质谱中获得的Me、I1和I2,和V=[12,1,14,16,32]以及公式(2)和(3)中的各同位素的相对丰度。Here in formula (9), X=[n1, n2, n3, n4, n5] T is the atomic composition vector of undetermined fragment ion, and A and B are the constant matrix that known quantity is made of, and here known quantity comprises Me , I 1 and I 2 obtained from tandem mass spectrometry, and V=[12, 1, 14, 16, 32] and the relative abundance of each isotope in formulas (2) and (3).
将公式(9)所描述的欧氏距离E最小化,即可得到X的一个解。通常,为了使获得的分子式符合化学意义,优选还要对公式(9)设置一些化学规则约束条件,例如:A solution of X can be obtained by minimizing the Euclidean distance E described by formula (9). Usually, in order to make the obtained molecular formula conform to the chemical meaning, it is preferable to set some chemical rule constraints on the formula (9), for example:
●用X获得的分子式对应的碎片离子质量一定要在范围[Me-δ,Me+δ]内,δ是m/z误差的最大范围,δ可由质谱仪的测量精度来确定。也就是要满足|VX-Me|≤δ。●The mass of the fragment ion corresponding to the molecular formula obtained by X must be within the range [M e -δ, M e +δ], δ is the maximum range of m/z error, and δ can be determined by the measurement accuracy of the mass spectrometer. That is to satisfy |VX-M e |≤δ.
●对于碎片离子分子式中的某种元素,用离子的m/z除以这种元素质量最低的同位素的质量数,取所得结果的整数部分就是此元素个数的上限。例如元素O的原子量为16,若离子的质荷比为m/z,则碎片离子中O元素的个数的上限为即在X中类似地,对于碎片离子中其它元素也可获得相似的约束条件。●For a certain element in the fragment ion molecular formula, divide the m/z of the ion by the mass number of the isotope with the lowest mass of this element, and take the integer part of the result as the upper limit of the number of this element. For example, the atomic weight of element O is 16, if the mass-to-charge ratio of ions is m/z, the upper limit of the number of O elements in fragment ions is i.e. in X Similarly, similar constraints can be obtained for other elements in the fragment ions.
●在碎片离子中,C的个数一定小于H的个数(即在X中n1<n3)、O和N的个数一定小于C的个数(即在n4<n1和n3<n1)等等。这些约束条件隐含在氨基酸残基的分子组成方式和主要离子类型的组成方式中,本领域的技术人员很容易根据它们的特点总结出来。●In fragment ions, the number of C must be less than the number of H (that is, n1<n3 in X), and the number of O and N must be less than the number of C (that is, in n4<n1 and n3<n1 )etc. These constraints are implicit in the molecular composition of amino acid residues and the composition of main ion types, and those skilled in the art can easily summarize them based on their characteristics.
●在带一个电子的离子中,H和N的个数之和为奇数。原因是如果离子带有一个电荷,那么就有一个不饱和化学键存在,并且,H和N都有奇数个化合价而C、O、S都有偶数个化合价。●In an ion with one electron, the sum of the numbers of H and N is an odd number. The reason is that if the ion has a charge, then there is an unsaturated chemical bond, and both H and N have odd valences and C, O, and S have even valences.
应当理解,本领域的技术人员也可从使碎片离子的分子式符合化学意义的目的出发构造出其它的约束条件。It should be understood that those skilled in the art can also construct other constraint conditions for the purpose of making the molecular formula of the fragment ion conform to the chemical meaning.
上述约束条件或者其它约束条件中的一部分或者全部可表示为一个线性不等式DX≤G。这样,结合公式(9),可以通过标准的二次规划方法来解决欧氏距离E的这个最小化问题,如公式(10)所示:Part or all of the above constraints or other constraints can be expressed as a linear inequality DX≤G. In this way, combined with formula (9), the standard quadratic programming method can be used to solve the minimization problem of Euclidean distance E, as shown in formula (10):
从公式(10)用二次规划方法求出的X的最优解为一个实数域内的解XR,为了寻找真正的分子式,可以将XR当作起始点,然后在它的邻域内局部搜索X的非负整数候选解。确切地说,就是对每一个与XR存在一个距离d的范围内的非负整数候选候选解分子式进行打分,或者说用公式(9)评价这些非负整数候选解的匹配度。d的值是与离子质量范围相适应的。这样避免了枚举所有可能的分子式,能够在大质量范围内预测离子分子式并且确保较高的可靠性和运行效率。The optimal solution of X obtained from formula (10) by quadratic programming method is a solution X R in the real number field. In order to find the real molecular formula, X R can be used as the starting point, and then search locally in its neighborhood Non-negative integer candidate solutions for X. To be precise, it is to score each non-negative integer candidate solution formula within a distance d from X R , or use formula (9) to evaluate the matching degree of these non-negative integer candidate solutions. The value of d is adapted to the ion mass range. This avoids enumerating all possible molecular formulas, enables prediction of ion molecular formulas in a large mass range and ensures high reliability and operating efficiency.
经过局部搜索,仍会产生一定数量的候选分子式,其中包括一些不合法的和与实验串联质谱不匹配的分子式(可分别称为无效的和不可能的分子式),为了提高预测的精确度,优选需要尽可能多的排除它们。在本发明中可利用包括平均同位素分布模式、化学规则约束和交叉验证中的一种或者多种方法来过滤候选分子式。这些方法具体描述如下:After a local search, a certain number of candidate molecular formulas will still be produced, including some illegal and mismatched molecular formulas with the experimental tandem mass spectrum (respectively called invalid and impossible molecular formulas), in order to improve the accuracy of prediction, preferably They need to be excluded as much as possible. In the present invention, one or more methods including average isotope distribution pattern, chemical rule constraint and cross-validation can be used to filter candidate molecular formulas. These methods are described in detail as follows:
A.平均的同位素分布模式A. Average isotope distribution pattern
所说的平均的同位素分布模式是理论同位素分布向量tIPV=(M,T1,T2)中的组成部分M、T1和T2之间的统计关系。为了寻找碎片离子的理论平均同位素分布模式,发明人计算了现有的蛋白质数据库中所有蛋白质的trypsin水解对应的多肽的理论碎片离子的同位素的平均分布和标准差,揭示了tIPV的组成部分M、T1和T2之间的关系。具体地说,发明人首先将SWISS-PROT中的蛋白质进行理论酶切计算得到多肽;然后选择质量在(60u~3000u)内的多肽,这个范围对应着Q-TOF MS/MS实验质谱的标准范围。另外,值得注意的是S的同位素+2S在自然界中的含量很高(出现的几率是0.04210,大约是18O的20倍),而多数情况下能够包含五个以上的S的多肽十分少见。因此,我们可以将上述分子式分成六类:S0,S1,S2,S3,S4和S5+,分别对应所含S的个数为0,1,2,3,4和5个及5个以上的肽段。发明人按这六个类别对做了统计。统计结果显示T1与质量M呈线性关系,T2则与M呈二次关系,而T2随着T1增加而增加并且与T1成二次函数关系。The mean isotope distribution pattern is the statistical relationship between the components M, T1 and T2 in the theoretical isotope distribution vector tIPV=(M, T1 , T2 ). In order to find the theoretical average isotope distribution pattern of fragment ions, the inventors calculated the average distribution and standard deviation of the isotopes of the theoretical fragment ions of polypeptides corresponding to trypsin hydrolysis of all proteins in the existing protein database, revealing the components of tIPV M, Relationship between T1 and T2 . Specifically, the inventors first theoretically digested the proteins in SWISS-PROT to obtain polypeptides; then selected polypeptides with a mass within (60u~3000u), which corresponds to the standard range of Q-TOF MS/MS experimental mass spectrometry . In addition, it is worth noting that the isotope of S + 2 S is very high in nature (the probability of occurrence is 0.04210, which is about 20 times that of 18 O), and in most cases, peptides that can contain more than five S are very rare . Therefore, we can divide the above molecular formula into six categories: S 0 , S 1 , S 2 , S 3 , S 4 and S 5+ , corresponding to the number of S contained in 0, 1, 2, 3, 4 and 5 and more than 5 peptides. The inventors made statistics according to these six categories. Statistical results show that T 1 has a linear relationship with mass M, T 2 has a quadratic relationship with M, and T 2 increases with T 1 and has a quadratic function with T 1 .
这样,通过T1、T2与M的上述分布关系可以对候选分子式进行过滤,以排除那些无效的和/或不可能的分子式。In this way, the candidate molecular formulas can be filtered through the above distribution relationship of T 1 , T 2 and M to exclude those invalid and/or impossible molecular formulas.
B.化学规则约束B. Chemical rule constraints
这里的化学规则约束与公式(10)中的约束条件DX≤G相类似,其区别在于:在公式(10)中,约束条件DX≤G用于约束公式The chemical rule constraint here is similar to the constraint condition DX≤G in formula (10), the difference is that in formula (10), the constraint condition DX≤G is used to constrain the formula
以便得到在此约束条件下X的一个实数域内的解XR。而在这里,这些约束条件用于约束在XR的领域内搜索得到的非负整数解候选分子式,以便对这些候选分子式进行过滤。In order to obtain the solution X R in a real field of X under this constraint. Here, these constraint conditions are used to constrain the candidate molecular formulas of non-negative integer solutions searched in the domain of XR , so as to filter these candidate molecular formulas.
C.交叉验证C. Cross Validation
特别地,一个肽段的b系列的碎片离子都是同源的,包括b-,a-,b*-,a*-,b°-,a°型离子,它们共享一个相同的原始氨基酸序列,由此可推测它们的同位素分布模式很相似。y系列离子也是这样。如果质谱中某两个碎片离子的Me相差28、17或18,并且这两个碎片离子的I1和I2很接近,就可认为两个碎片离子对应的eIPV同源的。而后,我们就可以使用同源的eIPV对预测结果进行交叉验证。例如对于同源的两个碎片离子,在一个碎片离子中的候选分子式列表中有Ca1Ha2Na3Oa4Sa5,如果Ca1-1Ha2Na3Oa4-1Sa5没有出现在另一个碎片离子的候选分子式列表里,那么就可以认为候选分子式Ca1Ha2Na3Oa4Sa5是随机匹配上的结果而将它排除。In particular, the b-series fragment ions of a peptide are all homologous, including b-, a-, b*-, a*-, b°-, a° type ions, which share the same original amino acid sequence , so it can be speculated that their isotope distribution patterns are very similar. The same is true for the y-series ions. If the Me of two fragment ions in the mass spectrum differ by 28, 17 or 18, and the I 1 and I 2 of the two fragment ions are very close, it can be considered that the eIPVs corresponding to the two fragment ions are homologous. Then, we can use the homologous eIPV to cross-validate the prediction results. For example, for two homologous fragment ions, there is C a1 H a2 N a3 O a4 S a5 in the list of candidate molecular formulas in one fragment ion, if C a1-1 H a2 N a3 O a4-1 S a5 does not appear in In the candidate molecular formula list of another fragment ion, then the candidate molecular formula C a1 H a2 N a3 O a4 S a5 can be considered as the result of random matching and excluded.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100908060A CN100390537C (en) | 2004-11-12 | 2004-11-12 | A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100908060A CN100390537C (en) | 2004-11-12 | 2004-11-12 | A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1773276A CN1773276A (en) | 2006-05-17 |
CN100390537C true CN100390537C (en) | 2008-05-28 |
Family
ID=36760343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100908060A Expired - Fee Related CN100390537C (en) | 2004-11-12 | 2004-11-12 | A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100390537C (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012035412A2 (en) * | 2010-09-15 | 2012-03-22 | Dh Technologies Development Pte. Ltd. | Data independent acquisition of production spectra and reference spectra library matching |
CN102445544B (en) * | 2010-10-15 | 2013-10-30 | 中国科学院计算技术研究所 | Method and system for increasing judgment accuracy of monoisotopic peaks |
US9638629B2 (en) * | 2011-08-03 | 2017-05-02 | Shimadzu Corporation | Mass analysis data analyzing method and apparatus |
CN103389335A (en) * | 2012-05-11 | 2013-11-13 | 中国科学院大连化学物理研究所 | Analysis device and method for identifying biomacromolecules |
CN103792275A (en) * | 2013-09-24 | 2014-05-14 | 中国科学院成都生物研究所 | High-resolution mass spectrum accurate molecular formula forecasting method |
EP3293754A1 (en) * | 2016-09-09 | 2018-03-14 | Thermo Fisher Scientific (Bremen) GmbH | Method for identification of the monoisotopic mass of species of molecules |
US10615015B2 (en) * | 2017-02-23 | 2020-04-07 | Thermo Fisher Scientific (Bremen) Gmbh | Method for identification of the elemental composition of species of molecules |
CN111089928A (en) * | 2020-01-16 | 2020-05-01 | 贵州理工学院 | Method, system, device and medium for analyzing mass spectrum ion peak of organic matter |
CN111524549B (en) * | 2020-03-31 | 2023-04-25 | 中国科学院计算技术研究所 | An Ion-Index-Based Method for Global Protein Identification |
CN113514531B (en) * | 2021-04-27 | 2022-10-25 | 清华大学 | Fragment ion prediction method and application of a compound |
CN115439752B (en) * | 2022-09-22 | 2023-04-18 | 上海市环境科学研究院 | Method for identifying atmospheric organic species, computer device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1251615A (en) * | 1997-02-14 | 2000-04-26 | 乔治华盛顿大学 | Assay for measurement of DNA synthesis rates |
US6391649B1 (en) * | 1999-05-04 | 2002-05-21 | The Rockefeller University | Method for the comparative quantitative analysis of proteins and other biological material by isotopic labeling and mass spectroscopy |
US6537432B1 (en) * | 1998-02-24 | 2003-03-25 | Target Discovery, Inc. | Protein separation via multidimensional electrophoresis |
-
2004
- 2004-11-12 CN CNB2004100908060A patent/CN100390537C/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1251615A (en) * | 1997-02-14 | 2000-04-26 | 乔治华盛顿大学 | Assay for measurement of DNA synthesis rates |
US6537432B1 (en) * | 1998-02-24 | 2003-03-25 | Target Discovery, Inc. | Protein separation via multidimensional electrophoresis |
US6391649B1 (en) * | 1999-05-04 | 2002-05-21 | The Rockefeller University | Method for the comparative quantitative analysis of proteins and other biological material by isotopic labeling and mass spectroscopy |
Non-Patent Citations (2)
Title |
---|
有机质谱中应用计算机推算分子式的方法. 苏跃增等.西北师范大学学报(自然科学版),第33卷第3期. 1997 * |
计算机据低分辨率质谱数据推算分子式的方法研究. 苏跃增等.新疆师范大学学报(自然科学版),第16卷第1期. 1997 * |
Also Published As
Publication number | Publication date |
---|---|
CN1773276A (en) | 2006-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1766394B1 (en) | System and method for grouping precursor and fragment ions using selected ion chromatograms | |
US8193485B2 (en) | Method and apparatus for identifying proteins in mixtures | |
CN100390537C (en) | A Method for Predicting the Molecular Formula of Ions Using Isotope Peaks of Fragment Ions in Tandem Mass Spectrometry | |
US10153145B2 (en) | Method of mass spectrometry and a mass spectrometer | |
US20140222348A1 (en) | Mass Spectrometry | |
US8694264B2 (en) | Mass spectrometry system | |
Bertsch et al. | De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation | |
US7555393B2 (en) | Evaluating the probability that MS/MS spectral data matches candidate sequence data | |
CN101055558B (en) | Mass spectrum effective peak selection method based on data isotope mode | |
Datta et al. | Spectrum fusion: using multiple mass spectra for de novo peptide sequencing | |
He et al. | De novo sequencing with limited number of post-translational modifications per peptide | |
Zou et al. | Charge state determination of peptide tandem mass spectra using support vector machine (SVM) | |
US11600359B2 (en) | Methods and systems for analysis of mass spectrometry data | |
CN115436347A (en) | Physicochemical property scoring for structure identification in ion spectroscopy | |
Yuan et al. | Features‐based deisotoping method for tandem mass spectra | |
Jarman et al. | A model of random sequences for de novo peptide sequencing | |
JP4855780B2 (en) | Peptide identification method and identification apparatus using mass spectrometry | |
CN102043011B (en) | Electron transport fragmentation mass spectrometry pretreatment and identification method | |
Lee et al. | An algorithmic approach to automated high-throughput identification of disulfide connectivity in proteins using tandem mass spectrometry | |
Yan et al. | Separation of ion types in tandem mass spectrometry data interpretation-a graph-theoretic approach | |
Oh et al. | Peptide identification by tandem mass spectra: an efficient parallel searching | |
Kouvaraki Chatzilampou | A new software platform for peptide and protein mass spectra interpretation | |
JP2007187501A5 (en) | ||
Oh et al. | A Two-way Parallel Searching for Peptide Identification via Tandem Mass Spectrometry. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20080528 Termination date: 20201112 |
|
CF01 | Termination of patent right due to non-payment of annual fee |