WO2021159833A1 - 一种核酸质谱数值处理方法 - Google Patents

一种核酸质谱数值处理方法 Download PDF

Info

Publication number
WO2021159833A1
WO2021159833A1 PCT/CN2020/134810 CN2020134810W WO2021159833A1 WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1 CN 2020134810 W CN2020134810 W CN 2020134810W WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1
Authority
WO
WIPO (PCT)
Prior art keywords
peak
mass spectrum
mass
spectrum
fitting
Prior art date
Application number
PCT/CN2020/134810
Other languages
English (en)
French (fr)
Inventor
树建伟
相双红
汪松炯
Original Assignee
浙江迪谱诊断技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江迪谱诊断技术有限公司 filed Critical 浙江迪谱诊断技术有限公司
Priority to US17/771,216 priority Critical patent/US20220383979A1/en
Priority to EP20918621.2A priority patent/EP4016379B1/en
Priority to JP2022535644A priority patent/JP7456665B2/ja
Publication of WO2021159833A1 publication Critical patent/WO2021159833A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • G06F2218/06Denoising by applying a scale-space analysis, e.g. using wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks

Definitions

  • the invention belongs to the technical field of nucleic acid mass spectrometry, and specifically relates to a numerical processing method for nucleic acid mass spectrometry.
  • Mass spectrometry technology has the advantages of rapidness, accuracy, and high sensitivity, and has been widely used in biological analysis in recent years.
  • nucleic acid plays a vital role in the growth, development, reproduction, heredity and mutation of organisms and other major life phenomena.
  • Modern biotechnology has discovered that most physiological or disease traits are manifested by a series of gene regulation existing in nucleic acid sequences. Therefore, for nucleic acids, accurate nucleotide detection is particularly important.
  • Mass spectrometry numerical processing is used as a nuclear It is self-evident that it is an indispensable part before glycidic acid detection. At present, there are problems such as low data collection conversion rate and uneven data in similar methods. These problems seriously affect the results of nucleotide detection, and further research is needed in this regard.
  • the purpose of the present invention is to solve the problems of low conversion rate and uneven data in the mass spectrometry data acquisition process, and to provide a nucleic acid mass spectrometry numerical processing method to extract reliable feature values before gene analysis, which is a way to improve the prior art
  • the limitation is a numerical processing method for nucleic acid mass spectrometry aimed at improving the accuracy of nucleotide detection.
  • a numerical processing method for nucleic acid mass spectrometry including the following steps:
  • Step S1 Recalibrate a single mass spectrum. For each detection point of the sample, obtain several mass spectra corresponding to different positions of the detection point. Each mass spectrum needs to use a special set of peaks with the expected mass-to-charge ratio, that is, anchor peaks. Recalibration;
  • Step S2 mass spectrum synthesis, based on step S1, combining several mass spectra corresponding to the same position of the detection point into a single mass spectrum of the detection point;
  • Step S3 Wavelet filtering, on the basis of step S2, eliminate high-frequency noise and baseline through a wavelet-based digital filter;
  • Step S4 Peak feature value extraction. On the basis of step S3, peak fitting is performed, and the peak height, peak width, peak area, mass offset, and signal-to-noise ratio are obtained based on the fitting curve of the mass spectrum.
  • the step of recalibrating a single mass spectrum includes:
  • Step S11 Candidate reference peak selection, select a set of reference peaks from all possible expected peaks according to the following criteria: first, the peak must be within the quality range of a specific interval; second, there is no adjacent reference peak within the quality range of the specific interval ;
  • Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum
  • the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
  • the total mass spectrum is decomposed into specific point intervals. For each interval, local noise is identified, and the local noise with an intensity greater than or equal to four times the local noise and greater than or equal to the global minimum is identified as a candidate peak.
  • the minimum value is preferably 0.01 *Maximum local maximum;
  • Step S13 mass spectrum peak fitting
  • Step S14 Final anchor peak selection.
  • For the detected peak list first find the cut-off SNR, that is, the minimum SNR.
  • the detected peak matches the candidate reference peak list, and only select the quality within the specific range of the candidate reference peak and the SNR is higher than the cut-off Those peaks of SNR;
  • Step S15 Recalibrate, combine the obtained anchor peaks and their expected masses, and calculate the calibration coefficients using a nonlinear fitting method.
  • the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
  • the function form is
  • step S13 the specific steps of peak fitting in step S13 include:
  • Step S131 Determine the expected line width.
  • Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
  • Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
  • Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
  • Step S135 shielding points, the points in the peak area whose SNR (SNR is calculated as the ratio of peak height to noise) greater than a given value and noise greater than a given value will be further shielded.
  • Step S136 The specific estimated line width of each peak is determined as the fitting area.
  • the Levenberg-Marquardt algorithm is used to fit a single Gaussian peak, and the specified parameters are found to minimize the tuning function.
  • the specific steps of determining the expected line width include:
  • ⁇ e L A +L B ⁇ M, where L A and L B are default parameters, and M is a given peak value (Da).
  • the specific steps of the tuning function include:
  • the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval
  • H f is the fitting height above the baseline corresponding to the point M f
  • the parameters M f and ⁇ f indicate the fitting quality
  • the combined line width, ⁇ i is given as a certain parameter according to the conditions.
  • the specific step of extracting feature values in the step S4 includes:
  • Step S41 peak fitting, the same as step S13;
  • Step S42 Record the following characteristics:
  • V A/SNR, area variance
  • the nucleic acid mass spectrometry numerical processing method of the present invention extracts reliable feature values before gene analysis. It is a nucleic acid mass spectrometry numerical processing aiming at improving the limitations of the prior art and improving the accuracy of nucleotide detection. Method; 2. Improve the credibility of nucleic acid mass spectrometry data collection.
  • Figure 3 Comparison chart before and after peak fitting.
  • the present invention is a numerical processing method for nucleic acid mass spectrometry, which includes the following steps:
  • Step S1 Recalibration of a single mass spectrum.
  • the mass spectrometer needs to be recalibrated.
  • the recalibration process is achieved by matching a set of specially identified peaks called anchor peaks to their expected masses, and follows the following steps:
  • Step S11 Selection of candidate reference peaks.
  • a set of clean reference peaks is selected from all possible expected peaks. The criteria are as follows:
  • the peak value must be within the mass range of 4000Da and 9000Da.
  • the peak has no adjacent reference peaks within the mass range defined by mass +/- resolution.
  • Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum At a fixed point, the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
  • the peaks with adjacent candidate peaks in a certain range, SNR (the ratio of the filtered intensity value to the local noise) ⁇ 2, and the quality value outside the pre-specified candidate reference peak range will be cleared. Finally, the peak index is adjusted based on the original intensity. Refer to Figure 1 and Figure 2 for the mass spectra before and after applying the filter.
  • Step S13 mass spectrum peak fitting, see Figure 3, the specific implementation steps are as follows:
  • Step S131 Determine the expected line width, which is determined by the following formula:
  • ⁇ e L A +L B ⁇ M, where L A and L B are default parameters (the default values are 2.5 and 0.0005 respectively), and M is the given peak value (Da).
  • Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
  • Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
  • Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
  • Step S135 The points in the peak area where the SNR (SNR is calculated as the ratio of peak height to noise) greater than 5 and noise greater than 1 will be further shielded.
  • SNR SNR is calculated as the ratio of peak height to noise
  • Step S136 The 4 estimated line widths of each peak are determined as the fitting area.
  • the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval, and H f is the fitting height above the baseline corresponding to the point M f.
  • ⁇ i is set equal to 1
  • ⁇ i is set to 0.2 or 0.4.
  • Step S14 Final final anchor peak selection.
  • the detected peak list first find the cut-off SNR (ie the minimum SNR), the detected peak matches the candidate reference peak list, and only select the quality within +/-25Da of the candidate reference peak And the SNR is higher than those peaks of the cut-off SNR.
  • the cut-off SNR ie the minimum SNR
  • Step S15 Recalibrate, combine the obtained anchored peaks and their expected masses, and calculate the calibration coefficients using the method of nonlinear fitting.
  • the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
  • the function form is
  • Step S2 mass spectrum synthesis, based on step S1, a number of mass spectra at different positions of the corresponding points are summarized into a unique mass spectrum of the detection point.
  • the method of synthesizing multiple mass spectra is "self-weighted average", which can be described by the following equation:
  • n is the number of mass spectra
  • I ij the intensity of mass i from the jth mass spectrum.
  • the best mass spectrum with the most anchored peaks is selected from the mass spectra. Initialize the added spectrum with the best spectrum. Only when the calibration coefficient and the calibration coefficient of the best spectrum meet the conditions (A should vary within 1%; B should vary within 10%; C should vary within 20Da), the absolute intensity or square of the mass spectrum and another mass spectrum Intensity summation.
  • Step S3 Wavelet filtering. Wavelet-based filtering is completed on the synthesized mass spectrum to eliminate high-frequency noise and baseline. Then another round of recalibration is performed on the filtered mass spectrum. After this round of recalibration, assign the new ABC coefficients to the synthesized mass spectrum and adjust the m/z value accordingly.
  • Step S4 Peak feature value extraction, see Figure 3.
  • the fitting process follows the following steps:
  • Step S41 Peak fitting, the step is the same as S13;
  • Step S42 Record the following characteristics after fitting successfully:
  • V A/SNR, area variance
  • the nucleic acid mass spectrum numerical processing method of the present invention extracts reliable characteristic values before gene analysis, and is a nucleic acid mass spectrum numerical value aimed at improving the limitations of the prior art and increasing the accuracy of nucleotide detection; and improving the possibility of nucleic acid mass spectrometry data collection Reliability.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

一种核酸质谱数值处理方法,包括如下步骤:步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;步骤S2:质谱图合成,在步骤S1基础上将对应检测点不同位置的若干质谱图合成该检测点的单一质谱图;步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。该方法提高核酸质谱数据采集的可信度且提高核苷酸检测的准确性。

Description

一种核酸质谱数值处理方法 技术领域
本发明属于核酸质谱技术领域,具体涉及一种核酸质谱数值处理方法。
背景技术
质谱技术具有快速、准确、灵敏度高等优点,近年来在生物分析方面得到了广泛的应用。核酸作为生命的基本物质,对生物体的生长、发育、繁殖、遗传及变异等重大生命现象起着至关重要的作用。现代生物技术发现,大部分的生理或疾病性状,都是由一系列存在于核酸序列上的基因调控而表现出来的,因此对于核酸,精准的核苷酸检测显得尤为重要,质谱数值处理作为核苷酸检测前必不可少的一部分,重要程度不言而喻。目前,在类似的方法中存在数据采集低转化率、数据不平均等问题,这些问题严重影响核苷酸检测的结果,在此方面需要出进一步研究。
发明内容
本发明的目的在于针对质谱数据采集过程中存在着低转化率、数据不平均等问题,提供了一种核酸质谱数值处理方法来提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值处理方法。
本发明是通过以下技术方案实现的:
一种核酸质谱数值处理方法,包括如下步骤:
步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;
步骤S2:质谱图合成,在步骤S1基础上将对应检测点同位置的若干质谱图合成该检测点的单一质谱图;
步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;
步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。
作为优选,所述步骤S1中,对单个质谱图进行重新校准步骤包括:
步骤S11:候选参考峰选择,根据以下标准从所有可能的预期峰中选择一组参考峰:一,峰值必须位于特定区间的质量范围内;二,在特定区间的质量范围内没有相邻参考峰;
步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱图的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:
Figure PCTCN2020134810-appb-000001
基于滤波后的强度值,将总质谱图分解为特定点间隔,对于每个间隔,识别局部噪声,具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值;
步骤S13:质谱峰拟合;
步骤S14:最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR即最小SNR,检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰特定范围内且SNR高于截止SNR的那些峰;
步骤S15:重新校准,结合得到的锚定峰以及他们的预期质量,利用非线性拟合的方法计算校准系数,这里,假设质谱仪与m/z(质荷比)之间的映射函数为布鲁克函数,函数形式为
Figure PCTCN2020134810-appb-000002
进一步地,所述步骤S13中峰拟合的具体步骤包括:
步骤S131:确定预期线宽。
步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4。
步骤S133:隐式基线计算为在MMλ m区间内质谱图强度y i的平均值,其中λ m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性插值提供y i的值。
步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS)。
步骤S135:屏蔽点,峰区域内SNR(SNR计算为峰高和噪声的比例)大于给定值以及噪声大于给定值的点将进一步被屏蔽。
步骤S136:每个峰的特定个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到指定参数使得调优函 数最小化。
更进一步地,所述步骤S131中,确定预期线宽的具体步骤包括:
λ e=L A+L B·M,其中,L A和L B为默认参数,M为给定的峰值(Da)。
更进一步地,所述步骤S136中,调优函数的具体步骤包括:
Figure PCTCN2020134810-appb-000003
其中,总和是从指定的区间中对所有{y i,m i}进行求和,H f是对应于点M f的基线上方的拟合高度,参数M f、λ f表示拟合质量、拟合线宽,σ i根据条件给定为某一参数。
作为又一优选,所述步骤S4中特征值提取具体步骤包括:
步骤S41:峰拟合,同步骤S13;
步骤S42:记录以下特征:
一、拟合峰值中心基线之上的高度,H f
二、拟合线宽,λ f
三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ f=M f-M e
四、A,在4λ f范围内拟合峰与基线之间的面积,
五、SNR=H f/N(M f),信噪比,
六、V=A/SNR,面积方差,
七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
本发明的优点:1、本发明的核酸质谱数值处理方法提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值处理方法;2、提高核酸质谱数据采集的可信度。
附图说明
图1:滤波前的质谱;
图2:滤波后的质谱;
图3:峰值拟合前后的对比图。
具体实施方式
下面结合附图和实施例对本发明内容做进一步说明。
本发明为一种核酸质谱数值处理方法,包括如下步骤:
步骤S1:单个质谱的重新校准。对于单个样本,获取对应于样本检测点不同位置的多个(通常n=5)质谱。每个质谱实际上是多次激光激发(通常n=20)的质谱之和。质谱的初始系数是基于假设质谱仪与m/z(质荷比)之间的映射函数为二次函数(函数形式为m=At 2+Bt+C)生成,在对质谱求和之前,还需要进行重新校准质谱。重新校准过程是通过将一组被称为锚定峰的特殊识别峰与其预期质量匹配来实现的,并遵循以下步骤:
步骤S11:候选参考峰的选择,从所有可能的预期峰中选择一组干净的参考峰,标准如下:
1、峰值必须位于4000Da和9000Da的质量范围内。
2、峰值在质量+/-分辨率定义的质量范围内没有相邻参考峰。
步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:
Figure PCTCN2020134810-appb-000004
基于滤波后的强度值,用一个较小的滑动窗口(n=+/-3)识别局部最大值。然后,将整个质谱分成每500个点一个区间,对于每个区间,局部噪声识别为周围1500点窗口(+/-一个区间)内局部最大值的33%。具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值。对于识别出的峰列表,在一定范围内存在相邻候选峰、SNR(滤波后的强度值和局部噪声的比例)≤2以及质量值在预先指定候选参考峰范围外的峰将被清除。最后,基于原始强度调整峰值指数。应用滤波器前后的质谱参见图1、图2。
步骤S13:质谱峰拟合,参见图3,具体实现步骤如下:
步骤S131:确定预期线宽,预期线宽用如下公式确定:
λ e=L A+L B·M,其中L A和L B为默认参数(默认值分别为2.5、0.0005),M 为给定的峰值(Da)。
步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4。
步骤S133:隐式基线计算为在MMλ m区间内质谱图强度y i的平均值,其中λ m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性插值提供y i的值。
步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS)。
步骤S135:峰区域内SNR(SNR计算为峰高和噪声的比例)大于5以及噪声大于1的点将进一步被屏蔽。
步骤S136:每个峰的4个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到参数M f、λ f(拟合质量、拟合线宽)使得调优函数(函数原型如下所示)最小化。
Figure PCTCN2020134810-appb-000005
总和是从指定的区间中对所有{y i,m i}进行求和,H f是对应于点M f的基线上方的拟合高度。离峰值中心0.5λ e以内的点,σ i设定为等于1,离峰值中心0.5λ e以外的点,σ i设定为0.2或0.4。
步骤S14:最终最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR(即最小SNR),检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰+/-25Da内且SNR高于截止SNR的那些峰。
步骤S15:重新校准,结合得到的锚定峰以及他们的预期质量,利用非线性拟合的方法计算校准系数,这里,假设质谱仪与m/z(质荷比)之间的映射函数为布鲁克函数,函数形式为
Figure PCTCN2020134810-appb-000006
步骤S2:质谱合成,在步骤S1基础上将对应点不同位置的若干质谱汇总成该检测点的唯一质谱。合成多个质谱的方法是“自加权平均值”,可以使用以下等式进行描述:
Figure PCTCN2020134810-appb-000007
其中n是质谱数,
Figure PCTCN2020134810-appb-000008
质量i的平均强度;I ij:来自于第j个质谱质量i的强度。当质谱具有不同的校准系数时,从质谱中选择最多锚定峰的最佳质谱。用最佳质谱初始化相加的质谱。只有当校准系数与最佳光谱的校准系数满足条件(A应在1%内变化;B应在10%以内变化;C应该20Da以内变化)时才能将该质谱与另一个质谱的绝对强度或平方强度求和。
步骤S3:小波滤波,基于小波的滤波在合成质谱上完成,用于消除高频噪声和基线。然后对该过滤后的质谱进行另一轮重新校准。在这轮重新校准之后,将新的ABC系数分配给合成质谱,并相应地调整m/z值。
步骤S4:峰特征值提取,参见图3,拟合过程遵循以下步骤:
步骤S41:峰拟合,步骤同S13;
步骤S42:拟合成功记录以下特征:
一、拟合峰值中心基线之上的高度,H f
二、拟合线宽,λ f
三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ f=M f-M e
四、A,在4λ f范围内拟合峰与基线之间的面积,
五、SNR=H f/N(M f),信噪比,
六、V=A/SNR,面积方差,
七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
本发明的核酸质谱数值处理方法提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值;提高核酸质谱数据采集的可信度。

Claims (9)

  1. 一种核酸质谱数值处理方法,其特征在于,包括如下步骤:
    步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;
    步骤S2:质谱图合成,在步骤S1基础上将对应检测点不同位置的若干质谱图合成该检测点的单一质谱图;
    步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;
    步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。
  2. 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S1中,对单个质谱图进行重新校准步骤包括:
    步骤S11:候选参考峰选择,根据以下标准从所有可能的预期峰中选择一组参考峰:一,峰值必须位于特定区间的质量范围内;二,在特定区间的质量范围内没有相邻参考峰;
    步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱图的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:
    Figure PCTCN2020134810-appb-100001
    基于滤波后的强度值,将总质谱图分解为特定点间隔,对于每个间隔,识别局部噪声,具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值;
    步骤S13:质谱峰拟合;
    步骤S14:最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR即最小SNR,检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰特定范围内且SNR高于截止SNR的那些峰;
    步骤S15:重新校准,结合得到的锚定峰以及他们的预期质量,利用非线性拟合的方法计算校准系数,这里,假设质谱仪与m/z(质荷比)之间的映射函数为布 鲁克函数,函数形式为
    Figure PCTCN2020134810-appb-100002
  3. 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S2如下:合成多个质谱图使用“自加权平均值”的方法。当质谱图具有不同的校准系数时,从质谱图中选择最多锚定峰的最佳质谱图;用最佳质谱图初始化相加的质谱图;只有当校准系数与最佳光谱的校准系数满足条件时才能将该质谱图与另一个质谱图的绝对强度或平方强度求和。
  4. 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S3如下:
    基于小波的滤波在合成质谱图上完成,用于消除高频噪声和基线,然后对该过滤后的质谱图进行另一轮重新校准(同步骤S1)并相应地调整m/z值。
  5. 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S4包括:
    步骤S41:峰拟合,同步骤S13;
    步骤S42:记录以下特征:
    一、拟合峰值中心基线之上的高度,H f
    二、拟合线宽,λ f
    三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ f=M f-M e
    四、A,在4λ f范围内拟合峰与基线之间的面积,
    五、SNR=H f/N(M f),信噪比,
    六、V=A/SNR,面积方差,
    七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
  6. 根据权利要求2所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S13中,峰拟合的具体步骤包括:
    步骤S131:确定预期线宽;
    步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4;
    步骤S133:隐式基线计算为在MMλ m区间内质谱图强度y i的平均值,其中λ m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性 插值提供y i的值;
    步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS);
    步骤S135:屏蔽点,峰区域内SNR(SNR计算为峰高和噪声的比例)大于给定值以及噪声大于给定值的点将进一步被屏蔽;
    步骤S136:每个峰的特定个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到指定参数使得调优函数最小化。
  7. 根据权利要求3所述的一种核酸质谱数值处理方法,其特征在于,合成多个质谱图使用“自加权平均值”的方法可以用下面等式进行描述:
    Figure PCTCN2020134810-appb-100003
    其中,n是质谱图数,
    Figure PCTCN2020134810-appb-100004
    质量i的平均强度;I ij:来自于第j个质谱图质量i的强度。
  8. 根据权利要求6所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S131中确定预期线宽用下面等式进行描述:
    λ e=L A+L B·M,其中,L A和L B为默认参数,M为给定的峰值(Da)。
  9. 根据权利要求6所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S136中调优函数可以用下面等式进行描述:
    Figure PCTCN2020134810-appb-100005
    其中,总和是从指定的区间中对所有{y i,m i}进行求和,H f是对应于点M f的基线上方的拟合高度,参数M f、λ f表示拟合质量、拟合线宽,σ i根据条件给定为某一参数。
PCT/CN2020/134810 2020-02-10 2020-12-09 一种核酸质谱数值处理方法 WO2021159833A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/771,216 US20220383979A1 (en) 2020-02-10 2020-12-09 Nucleic acid mass spectrum numerical processing method
EP20918621.2A EP4016379B1 (en) 2020-02-10 2020-12-09 Nucleic acid mass spectrum numerical processing method
JP2022535644A JP7456665B2 (ja) 2020-02-10 2020-12-09 核酸マススペクトル数値処理方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010084107.4A CN111325121B (zh) 2020-02-10 2020-02-10 一种核酸质谱数值处理方法
CN202010084107.4 2020-02-10

Publications (1)

Publication Number Publication Date
WO2021159833A1 true WO2021159833A1 (zh) 2021-08-19

Family

ID=71172661

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134810 WO2021159833A1 (zh) 2020-02-10 2020-12-09 一种核酸质谱数值处理方法

Country Status (5)

Country Link
US (1) US20220383979A1 (zh)
EP (1) EP4016379B1 (zh)
JP (1) JP7456665B2 (zh)
CN (1) CN111325121B (zh)
WO (1) WO2021159833A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325121B (zh) * 2020-02-10 2024-02-20 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法
CN112444556B (zh) * 2020-09-27 2021-12-03 浙江迪谱诊断技术有限公司 一种飞行时间核酸质谱参数确定方法
CN112378986B (zh) * 2021-01-18 2021-08-03 宁波华仪宁创智能科技有限公司 质谱分析方法
CN112877412B (zh) * 2021-02-05 2022-09-27 浙江迪谱诊断技术有限公司 一种检测核酸质谱设备性能的方法
CN113659961B (zh) * 2021-07-19 2024-01-30 广东迈能欣科技有限公司 一种应用于二氧化碳传感器的滤波算法
CN114487073B (zh) * 2021-12-27 2024-04-12 浙江迪谱诊断技术有限公司 一种飞行时间核酸质谱数据校准方法
CN114023379B (zh) * 2021-12-31 2022-05-13 浙江迪谱诊断技术有限公司 一种确定基因型的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172694A1 (en) * 2016-06-28 2019-06-06 Shimadzu Corporation Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus
CN110196274A (zh) * 2019-04-25 2019-09-03 上海裕达实业有限公司 可降低噪声的质谱装置及方法
CN110441253A (zh) * 2019-07-22 2019-11-12 杭州华聚复合材料有限公司 一种快速检测PP-g-MAH接枝率的方法
CN111325121A (zh) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030190644A1 (en) * 1999-10-13 2003-10-09 Andreas Braun Methods for generating databases and databases for identifying polymorphic genetic markers
EP1469314B1 (en) * 2001-12-08 2008-08-06 Micromass UK Limited Method of mass spectometry
CA2684217C (en) 2007-04-13 2016-12-13 Sequenom, Inc. Comparative sequence analysis processes and systems
EP2160570A4 (en) * 2007-06-02 2012-12-05 Cerno Bioscience Llc SELF-CALIBRATION APPROACH FOR MASS SPECTROMETRY
GB201103854D0 (en) * 2011-03-07 2011-04-20 Micromass Ltd Dynamic resolution correction of quadrupole mass analyser
DE102013006132B9 (de) * 2013-04-10 2015-11-19 Bruker Daltonik Gmbh Hochdurchsatz-Charakterisierung von Proben durch Massenspektrometrie
CN105334279B (zh) * 2014-08-14 2017-08-04 大连达硕信息技术有限公司 一种高分辨质谱数据的处理方法
FR3035410B1 (fr) * 2015-04-24 2021-10-01 Biomerieux Sa Procede d'identification par spectrometrie de masse d'un sous-groupe de microorganisme inconnu parmi un ensemble de sous-groupes de reference
WO2017180652A1 (en) * 2016-04-11 2017-10-19 Applied Proteomics, Inc. Mass spectrometric data analysis workflow
JP7063342B2 (ja) * 2018-02-05 2022-05-09 株式会社島津製作所 質量分析装置及び質量分析装置における質量較正方法
EP3805748A4 (en) * 2018-05-30 2021-06-23 Shimadzu Corporation SPECTRAL DATA PROCESSING DEVICE AND ANALYSIS DEVICE
CN109145873B (zh) * 2018-09-27 2022-03-22 广东工业大学 基于遗传算法的光谱高斯峰特征提取算法
CN109726667B (zh) * 2018-12-25 2021-03-02 广州市锐博生物科技有限公司 质谱数据处理方法和装置、计算机设备、计算机存储介质
CN109632860A (zh) * 2019-01-15 2019-04-16 中国科学院昆明植物研究所 一种解析混合物中单体化合物结构的方法
CN114487072B (zh) * 2021-12-27 2024-04-12 浙江迪谱诊断技术有限公司 一种飞行时间质谱峰拟合方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172694A1 (en) * 2016-06-28 2019-06-06 Shimadzu Corporation Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus
CN110196274A (zh) * 2019-04-25 2019-09-03 上海裕达实业有限公司 可降低噪声的质谱装置及方法
CN110441253A (zh) * 2019-07-22 2019-11-12 杭州华聚复合材料有限公司 一种快速检测PP-g-MAH接枝率的方法
CN111325121A (zh) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法

Also Published As

Publication number Publication date
EP4016379C0 (en) 2024-01-31
CN111325121A (zh) 2020-06-23
CN111325121B (zh) 2024-02-20
US20220383979A1 (en) 2022-12-01
JP2023515296A (ja) 2023-04-13
JP7456665B2 (ja) 2024-03-27
EP4016379B1 (en) 2024-01-31
EP4016379A1 (en) 2022-06-22
EP4016379A4 (en) 2022-12-14

Similar Documents

Publication Publication Date Title
WO2021159833A1 (zh) 一种核酸质谱数值处理方法
JP5068541B2 (ja) 液体クロマトグラフィ/質量分析データ中のピークを同定し、スペクトルおよびクロマトグラムを形成するための装置および方法
WO2003095978A2 (en) Methods for time-alignment of liquid chromatography-mass spectrometry data
CN110791565B (zh) 一种用于ii期结直肠癌复发预测的预后标记基因及随机生存森林模型
US20110282588A1 (en) Method to automatically identify peaks and monoisotopic peaks in mass spectral data for biomolecular applications
CN114487073B (zh) 一种飞行时间核酸质谱数据校准方法
CN110763913B (zh) 一种基于信号分段分类的导数谱平滑处理方法
CN113588847B (zh) 一种生物代谢组学数据处理方法、分析方法及装置和应用
CN114609319B (zh) 基于噪声估计的谱峰识别方法及系统
CN111521708A (zh) 一种玉米花生核桃黄曲霉侵染霉变的特定分子标记物及利用其进行早期霉变检测的方法
CN112557332A (zh) 一种基于光谱分峰拟合的光谱分段和光谱比对方法
CN110714078A (zh) 一种用于ii期结直肠癌复发预测的标记基因及应用
CN105718723B (zh) 一种质谱数据处理中谱峰位置检测方法
Li et al. SELDI-TOF mass spectrometry protein data
CN110806456A (zh) 一种UPLC-HRMS Profile模式非靶向代谢轮廓数据自动解析的方法
Antoniadis et al. Peaks detection and alignment for mass spectrometry data
CN112711980A (zh) 一种在矿物光谱特征提取中选取小波基的方法
CN112730373A (zh) 一种用于深度学习训练的拉曼光谱数据集分析方法
CN114487072B (zh) 一种飞行时间质谱峰拟合方法
CN115359847A (zh) 蛋白质组学串联质谱图寻峰算法
CN109324017B (zh) 一种提高近红外光谱分析技术建模光谱质量的方法
CN111256819B (zh) 一种光谱仪器的降噪方法
CN112161966B (zh) 一种含有荧光光谱的样本拉曼光谱的分离方法和装置
CN114428139A (zh) 代谢标志物及在制备高尿酸血症的风险预测试剂盒方面中的应用和试剂盒
CN115561193A (zh) 一种傅里叶红外光谱仪数据处理和分析系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918621

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020918621

Country of ref document: EP

Effective date: 20220317

ENP Entry into the national phase

Ref document number: 2022535644

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE