WO2021159833A1 - 一种核酸质谱数值处理方法 - Google Patents
一种核酸质谱数值处理方法 Download PDFInfo
- Publication number
- WO2021159833A1 WO2021159833A1 PCT/CN2020/134810 CN2020134810W WO2021159833A1 WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1 CN 2020134810 W CN2020134810 W CN 2020134810W WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- peak
- mass spectrum
- mass
- spectrum
- fitting
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N27/00—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
- G01N27/62—Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
-
- H—ELECTRICITY
- H01—ELECTRIC ELEMENTS
- H01J—ELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
- H01J49/00—Particle spectrometers or separator tubes
- H01J49/0027—Methods for using particle spectrometers
- H01J49/0036—Step by step routines describing the handling of the data generated during a measurement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/02—Preprocessing
- G06F2218/04—Denoising
- G06F2218/06—Denoising by applying a scale-space analysis, e.g. using wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
- G06F2218/10—Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
Definitions
- the invention belongs to the technical field of nucleic acid mass spectrometry, and specifically relates to a numerical processing method for nucleic acid mass spectrometry.
- Mass spectrometry technology has the advantages of rapidness, accuracy, and high sensitivity, and has been widely used in biological analysis in recent years.
- nucleic acid plays a vital role in the growth, development, reproduction, heredity and mutation of organisms and other major life phenomena.
- Modern biotechnology has discovered that most physiological or disease traits are manifested by a series of gene regulation existing in nucleic acid sequences. Therefore, for nucleic acids, accurate nucleotide detection is particularly important.
- Mass spectrometry numerical processing is used as a nuclear It is self-evident that it is an indispensable part before glycidic acid detection. At present, there are problems such as low data collection conversion rate and uneven data in similar methods. These problems seriously affect the results of nucleotide detection, and further research is needed in this regard.
- the purpose of the present invention is to solve the problems of low conversion rate and uneven data in the mass spectrometry data acquisition process, and to provide a nucleic acid mass spectrometry numerical processing method to extract reliable feature values before gene analysis, which is a way to improve the prior art
- the limitation is a numerical processing method for nucleic acid mass spectrometry aimed at improving the accuracy of nucleotide detection.
- a numerical processing method for nucleic acid mass spectrometry including the following steps:
- Step S1 Recalibrate a single mass spectrum. For each detection point of the sample, obtain several mass spectra corresponding to different positions of the detection point. Each mass spectrum needs to use a special set of peaks with the expected mass-to-charge ratio, that is, anchor peaks. Recalibration;
- Step S2 mass spectrum synthesis, based on step S1, combining several mass spectra corresponding to the same position of the detection point into a single mass spectrum of the detection point;
- Step S3 Wavelet filtering, on the basis of step S2, eliminate high-frequency noise and baseline through a wavelet-based digital filter;
- Step S4 Peak feature value extraction. On the basis of step S3, peak fitting is performed, and the peak height, peak width, peak area, mass offset, and signal-to-noise ratio are obtained based on the fitting curve of the mass spectrum.
- the step of recalibrating a single mass spectrum includes:
- Step S11 Candidate reference peak selection, select a set of reference peaks from all possible expected peaks according to the following criteria: first, the peak must be within the quality range of a specific interval; second, there is no adjacent reference peak within the quality range of the specific interval ;
- Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum
- the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
- the total mass spectrum is decomposed into specific point intervals. For each interval, local noise is identified, and the local noise with an intensity greater than or equal to four times the local noise and greater than or equal to the global minimum is identified as a candidate peak.
- the minimum value is preferably 0.01 *Maximum local maximum;
- Step S13 mass spectrum peak fitting
- Step S14 Final anchor peak selection.
- For the detected peak list first find the cut-off SNR, that is, the minimum SNR.
- the detected peak matches the candidate reference peak list, and only select the quality within the specific range of the candidate reference peak and the SNR is higher than the cut-off Those peaks of SNR;
- Step S15 Recalibrate, combine the obtained anchor peaks and their expected masses, and calculate the calibration coefficients using a nonlinear fitting method.
- the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
- the function form is
- step S13 the specific steps of peak fitting in step S13 include:
- Step S131 Determine the expected line width.
- Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
- Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
- Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
- Step S135 shielding points, the points in the peak area whose SNR (SNR is calculated as the ratio of peak height to noise) greater than a given value and noise greater than a given value will be further shielded.
- Step S136 The specific estimated line width of each peak is determined as the fitting area.
- the Levenberg-Marquardt algorithm is used to fit a single Gaussian peak, and the specified parameters are found to minimize the tuning function.
- the specific steps of determining the expected line width include:
- ⁇ e L A +L B ⁇ M, where L A and L B are default parameters, and M is a given peak value (Da).
- the specific steps of the tuning function include:
- the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval
- H f is the fitting height above the baseline corresponding to the point M f
- the parameters M f and ⁇ f indicate the fitting quality
- the combined line width, ⁇ i is given as a certain parameter according to the conditions.
- the specific step of extracting feature values in the step S4 includes:
- Step S41 peak fitting, the same as step S13;
- Step S42 Record the following characteristics:
- V A/SNR, area variance
- the nucleic acid mass spectrometry numerical processing method of the present invention extracts reliable feature values before gene analysis. It is a nucleic acid mass spectrometry numerical processing aiming at improving the limitations of the prior art and improving the accuracy of nucleotide detection. Method; 2. Improve the credibility of nucleic acid mass spectrometry data collection.
- Figure 3 Comparison chart before and after peak fitting.
- the present invention is a numerical processing method for nucleic acid mass spectrometry, which includes the following steps:
- Step S1 Recalibration of a single mass spectrum.
- the mass spectrometer needs to be recalibrated.
- the recalibration process is achieved by matching a set of specially identified peaks called anchor peaks to their expected masses, and follows the following steps:
- Step S11 Selection of candidate reference peaks.
- a set of clean reference peaks is selected from all possible expected peaks. The criteria are as follows:
- the peak value must be within the mass range of 4000Da and 9000Da.
- the peak has no adjacent reference peaks within the mass range defined by mass +/- resolution.
- Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum At a fixed point, the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
- the peaks with adjacent candidate peaks in a certain range, SNR (the ratio of the filtered intensity value to the local noise) ⁇ 2, and the quality value outside the pre-specified candidate reference peak range will be cleared. Finally, the peak index is adjusted based on the original intensity. Refer to Figure 1 and Figure 2 for the mass spectra before and after applying the filter.
- Step S13 mass spectrum peak fitting, see Figure 3, the specific implementation steps are as follows:
- Step S131 Determine the expected line width, which is determined by the following formula:
- ⁇ e L A +L B ⁇ M, where L A and L B are default parameters (the default values are 2.5 and 0.0005 respectively), and M is the given peak value (Da).
- Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
- Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
- Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
- Step S135 The points in the peak area where the SNR (SNR is calculated as the ratio of peak height to noise) greater than 5 and noise greater than 1 will be further shielded.
- SNR SNR is calculated as the ratio of peak height to noise
- Step S136 The 4 estimated line widths of each peak are determined as the fitting area.
- the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval, and H f is the fitting height above the baseline corresponding to the point M f.
- ⁇ i is set equal to 1
- ⁇ i is set to 0.2 or 0.4.
- Step S14 Final final anchor peak selection.
- the detected peak list first find the cut-off SNR (ie the minimum SNR), the detected peak matches the candidate reference peak list, and only select the quality within +/-25Da of the candidate reference peak And the SNR is higher than those peaks of the cut-off SNR.
- the cut-off SNR ie the minimum SNR
- Step S15 Recalibrate, combine the obtained anchored peaks and their expected masses, and calculate the calibration coefficients using the method of nonlinear fitting.
- the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
- the function form is
- Step S2 mass spectrum synthesis, based on step S1, a number of mass spectra at different positions of the corresponding points are summarized into a unique mass spectrum of the detection point.
- the method of synthesizing multiple mass spectra is "self-weighted average", which can be described by the following equation:
- n is the number of mass spectra
- I ij the intensity of mass i from the jth mass spectrum.
- the best mass spectrum with the most anchored peaks is selected from the mass spectra. Initialize the added spectrum with the best spectrum. Only when the calibration coefficient and the calibration coefficient of the best spectrum meet the conditions (A should vary within 1%; B should vary within 10%; C should vary within 20Da), the absolute intensity or square of the mass spectrum and another mass spectrum Intensity summation.
- Step S3 Wavelet filtering. Wavelet-based filtering is completed on the synthesized mass spectrum to eliminate high-frequency noise and baseline. Then another round of recalibration is performed on the filtered mass spectrum. After this round of recalibration, assign the new ABC coefficients to the synthesized mass spectrum and adjust the m/z value accordingly.
- Step S4 Peak feature value extraction, see Figure 3.
- the fitting process follows the following steps:
- Step S41 Peak fitting, the step is the same as S13;
- Step S42 Record the following characteristics after fitting successfully:
- V A/SNR, area variance
- the nucleic acid mass spectrum numerical processing method of the present invention extracts reliable characteristic values before gene analysis, and is a nucleic acid mass spectrum numerical value aimed at improving the limitations of the prior art and increasing the accuracy of nucleotide detection; and improving the possibility of nucleic acid mass spectrometry data collection Reliability.
Landscapes
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Analytical Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Electrochemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
Abstract
一种核酸质谱数值处理方法,包括如下步骤:步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;步骤S2:质谱图合成,在步骤S1基础上将对应检测点不同位置的若干质谱图合成该检测点的单一质谱图;步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。该方法提高核酸质谱数据采集的可信度且提高核苷酸检测的准确性。
Description
本发明属于核酸质谱技术领域,具体涉及一种核酸质谱数值处理方法。
质谱技术具有快速、准确、灵敏度高等优点,近年来在生物分析方面得到了广泛的应用。核酸作为生命的基本物质,对生物体的生长、发育、繁殖、遗传及变异等重大生命现象起着至关重要的作用。现代生物技术发现,大部分的生理或疾病性状,都是由一系列存在于核酸序列上的基因调控而表现出来的,因此对于核酸,精准的核苷酸检测显得尤为重要,质谱数值处理作为核苷酸检测前必不可少的一部分,重要程度不言而喻。目前,在类似的方法中存在数据采集低转化率、数据不平均等问题,这些问题严重影响核苷酸检测的结果,在此方面需要出进一步研究。
发明内容
本发明的目的在于针对质谱数据采集过程中存在着低转化率、数据不平均等问题,提供了一种核酸质谱数值处理方法来提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值处理方法。
本发明是通过以下技术方案实现的:
一种核酸质谱数值处理方法,包括如下步骤:
步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;
步骤S2:质谱图合成,在步骤S1基础上将对应检测点同位置的若干质谱图合成该检测点的单一质谱图;
步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;
步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。
作为优选,所述步骤S1中,对单个质谱图进行重新校准步骤包括:
步骤S11:候选参考峰选择,根据以下标准从所有可能的预期峰中选择一组参考峰:一,峰值必须位于特定区间的质量范围内;二,在特定区间的质量范围内没有相邻参考峰;
步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱图的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:
基于滤波后的强度值,将总质谱图分解为特定点间隔,对于每个间隔,识别局部噪声,具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值;
步骤S13:质谱峰拟合;
步骤S14:最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR即最小SNR,检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰特定范围内且SNR高于截止SNR的那些峰;
进一步地,所述步骤S13中峰拟合的具体步骤包括:
步骤S131:确定预期线宽。
步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4。
步骤S133:隐式基线计算为在MMλ
m区间内质谱图强度y
i的平均值,其中λ
m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性插值提供y
i的值。
步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS)。
步骤S135:屏蔽点,峰区域内SNR(SNR计算为峰高和噪声的比例)大于给定值以及噪声大于给定值的点将进一步被屏蔽。
步骤S136:每个峰的特定个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到指定参数使得调优函 数最小化。
更进一步地,所述步骤S131中,确定预期线宽的具体步骤包括:
λ
e=L
A+L
B·M,其中,L
A和L
B为默认参数,M为给定的峰值(Da)。
更进一步地,所述步骤S136中,调优函数的具体步骤包括:
作为又一优选,所述步骤S4中特征值提取具体步骤包括:
步骤S41:峰拟合,同步骤S13;
步骤S42:记录以下特征:
一、拟合峰值中心基线之上的高度,H
f,
二、拟合线宽,λ
f,
三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ
f=M
f-M
e,
四、A,在4λ
f范围内拟合峰与基线之间的面积,
五、SNR=H
f/N(M
f),信噪比,
六、V=A/SNR,面积方差,
七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
本发明的优点:1、本发明的核酸质谱数值处理方法提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值处理方法;2、提高核酸质谱数据采集的可信度。
图1:滤波前的质谱;
图2:滤波后的质谱;
图3:峰值拟合前后的对比图。
下面结合附图和实施例对本发明内容做进一步说明。
本发明为一种核酸质谱数值处理方法,包括如下步骤:
步骤S1:单个质谱的重新校准。对于单个样本,获取对应于样本检测点不同位置的多个(通常n=5)质谱。每个质谱实际上是多次激光激发(通常n=20)的质谱之和。质谱的初始系数是基于假设质谱仪与m/z(质荷比)之间的映射函数为二次函数(函数形式为m=At
2+Bt+C)生成,在对质谱求和之前,还需要进行重新校准质谱。重新校准过程是通过将一组被称为锚定峰的特殊识别峰与其预期质量匹配来实现的,并遵循以下步骤:
步骤S11:候选参考峰的选择,从所有可能的预期峰中选择一组干净的参考峰,标准如下:
1、峰值必须位于4000Da和9000Da的质量范围内。
2、峰值在质量+/-分辨率定义的质量范围内没有相邻参考峰。
步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:
基于滤波后的强度值,用一个较小的滑动窗口(n=+/-3)识别局部最大值。然后,将整个质谱分成每500个点一个区间,对于每个区间,局部噪声识别为周围1500点窗口(+/-一个区间)内局部最大值的33%。具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值。对于识别出的峰列表,在一定范围内存在相邻候选峰、SNR(滤波后的强度值和局部噪声的比例)≤2以及质量值在预先指定候选参考峰范围外的峰将被清除。最后,基于原始强度调整峰值指数。应用滤波器前后的质谱参见图1、图2。
步骤S13:质谱峰拟合,参见图3,具体实现步骤如下:
步骤S131:确定预期线宽,预期线宽用如下公式确定:
λ
e=L
A+L
B·M,其中L
A和L
B为默认参数(默认值分别为2.5、0.0005),M 为给定的峰值(Da)。
步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4。
步骤S133:隐式基线计算为在MMλ
m区间内质谱图强度y
i的平均值,其中λ
m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性插值提供y
i的值。
步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS)。
步骤S135:峰区域内SNR(SNR计算为峰高和噪声的比例)大于5以及噪声大于1的点将进一步被屏蔽。
步骤S136:每个峰的4个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到参数M
f、λ
f(拟合质量、拟合线宽)使得调优函数(函数原型如下所示)最小化。
总和是从指定的区间中对所有{y
i,m
i}进行求和,H
f是对应于点M
f的基线上方的拟合高度。离峰值中心0.5λ
e以内的点,σ
i设定为等于1,离峰值中心0.5λ
e以外的点,σ
i设定为0.2或0.4。
步骤S14:最终最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR(即最小SNR),检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰+/-25Da内且SNR高于截止SNR的那些峰。
步骤S2:质谱合成,在步骤S1基础上将对应点不同位置的若干质谱汇总成该检测点的唯一质谱。合成多个质谱的方法是“自加权平均值”,可以使用以下等式进行描述:
其中n是质谱数,
质量i的平均强度;I
ij:来自于第j个质谱质量i的强度。当质谱具有不同的校准系数时,从质谱中选择最多锚定峰的最佳质谱。用最佳质谱初始化相加的质谱。只有当校准系数与最佳光谱的校准系数满足条件(A应在1%内变化;B应在10%以内变化;C应该20Da以内变化)时才能将该质谱与另一个质谱的绝对强度或平方强度求和。
步骤S3:小波滤波,基于小波的滤波在合成质谱上完成,用于消除高频噪声和基线。然后对该过滤后的质谱进行另一轮重新校准。在这轮重新校准之后,将新的ABC系数分配给合成质谱,并相应地调整m/z值。
步骤S4:峰特征值提取,参见图3,拟合过程遵循以下步骤:
步骤S41:峰拟合,步骤同S13;
步骤S42:拟合成功记录以下特征:
一、拟合峰值中心基线之上的高度,H
f,
二、拟合线宽,λ
f,
三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ
f=M
f-M
e,
四、A,在4λ
f范围内拟合峰与基线之间的面积,
五、SNR=H
f/N(M
f),信噪比,
六、V=A/SNR,面积方差,
七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
本发明的核酸质谱数值处理方法提取基因分析前的可靠特征值,是一种以改进现有技术局限性,提高核苷酸检测的准确性为目标的核酸质谱数值;提高核酸质谱数据采集的可信度。
Claims (9)
- 一种核酸质谱数值处理方法,其特征在于,包括如下步骤:步骤S1:单个质谱图重新校准,对于样本的每个检测点,获取对应检测点的不同位置的若干质谱图,每个质谱图需要使用一组特殊的具有预期质荷比的峰即锚定峰进行重新校准;步骤S2:质谱图合成,在步骤S1基础上将对应检测点不同位置的若干质谱图合成该检测点的单一质谱图;步骤S3:小波滤波,在步骤S2基础上,通过基于小波的数字滤波器来消除高频噪声和基线;步骤S4:峰特征值提取,在步骤S3基础上,进行峰拟合,基于质谱图的拟合曲线,获得峰高、峰宽、峰面积,质量偏移以及信噪比。
- 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S1中,对单个质谱图进行重新校准步骤包括:步骤S11:候选参考峰选择,根据以下标准从所有可能的预期峰中选择一组参考峰:一,峰值必须位于特定区间的质量范围内;二,在特定区间的质量范围内没有相邻参考峰;步骤S12:峰定位,宽度为9的权重矩阵卷积滤波器应用于质谱图,矩阵优选:(-4,0,1,2,2,2,1,0,-4),对于质谱图的给定点,应用此滤波后的强度值等于周围9个值的加权和,如下公式表示:基于滤波后的强度值,将总质谱图分解为特定点间隔,对于每个间隔,识别局部噪声,具有强度大于等于四倍局部噪声并且大于等于全局最小值识别为候选峰,最小值优选:0.01*最大局部最大值;步骤S13:质谱峰拟合;步骤S14:最终锚定峰选择,对于检测到的峰值列表,首先找出截止SNR即最小SNR,检测到的峰与候选参考峰列表匹配,仅选择质量在候选参考峰特定范围内且SNR高于截止SNR的那些峰;
- 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S2如下:合成多个质谱图使用“自加权平均值”的方法。当质谱图具有不同的校准系数时,从质谱图中选择最多锚定峰的最佳质谱图;用最佳质谱图初始化相加的质谱图;只有当校准系数与最佳光谱的校准系数满足条件时才能将该质谱图与另一个质谱图的绝对强度或平方强度求和。
- 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S3如下:基于小波的滤波在合成质谱图上完成,用于消除高频噪声和基线,然后对该过滤后的质谱图进行另一轮重新校准(同步骤S1)并相应地调整m/z值。
- 根据权利要求1所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S4包括:步骤S41:峰拟合,同步骤S13;步骤S42:记录以下特征:一、拟合峰值中心基线之上的高度,H f,二、拟合线宽,λ f,三、峰偏移(拟合峰值中心与预期峰值中心的距离),δ f=M f-M e,四、A,在4λ f范围内拟合峰与基线之间的面积,五、SNR=H f/N(M f),信噪比,六、V=A/SNR,面积方差,七、Δ,拟合面积差,拟合强度和测量强度之间平方差之和的平方根。
- 根据权利要求2所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S13中,峰拟合的具体步骤包括:步骤S131:确定预期线宽;步骤S132:预期信号的区域在NN个预期线宽的间隔内被屏蔽,NN优选4;步骤S133:隐式基线计算为在MMλ m区间内质谱图强度y i的平均值,其中λ m是此区间内最小的估计线宽,其中MM优选80,在此区间的屏蔽区域内,用线性 插值提供y i的值;步骤S134:将噪声水平计算为(信号-基线)的运行的有效值(RMS);步骤S135:屏蔽点,峰区域内SNR(SNR计算为峰高和噪声的比例)大于给定值以及噪声大于给定值的点将进一步被屏蔽;步骤S136:每个峰的特定个估计线宽内被确定为拟合区域,在没重叠峰的情况下,用Levenberg-Marquardt算法拟合单个高斯峰,找到指定参数使得调优函数最小化。
- 根据权利要求6所述的一种核酸质谱数值处理方法,其特征在于,所述步骤S131中确定预期线宽用下面等式进行描述:λ e=L A+L B·M,其中,L A和L B为默认参数,M为给定的峰值(Da)。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/771,216 US20220383979A1 (en) | 2020-02-10 | 2020-12-09 | Nucleic acid mass spectrum numerical processing method |
EP20918621.2A EP4016379B1 (en) | 2020-02-10 | 2020-12-09 | Nucleic acid mass spectrum numerical processing method |
JP2022535644A JP7456665B2 (ja) | 2020-02-10 | 2020-12-09 | 核酸マススペクトル数値処理方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010084107.4A CN111325121B (zh) | 2020-02-10 | 2020-02-10 | 一种核酸质谱数值处理方法 |
CN202010084107.4 | 2020-02-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021159833A1 true WO2021159833A1 (zh) | 2021-08-19 |
Family
ID=71172661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/134810 WO2021159833A1 (zh) | 2020-02-10 | 2020-12-09 | 一种核酸质谱数值处理方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20220383979A1 (zh) |
EP (1) | EP4016379B1 (zh) |
JP (1) | JP7456665B2 (zh) |
CN (1) | CN111325121B (zh) |
WO (1) | WO2021159833A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325121B (zh) * | 2020-02-10 | 2024-02-20 | 浙江迪谱诊断技术有限公司 | 一种核酸质谱数值处理方法 |
CN112444556B (zh) * | 2020-09-27 | 2021-12-03 | 浙江迪谱诊断技术有限公司 | 一种飞行时间核酸质谱参数确定方法 |
CN112378986B (zh) * | 2021-01-18 | 2021-08-03 | 宁波华仪宁创智能科技有限公司 | 质谱分析方法 |
CN112877412B (zh) * | 2021-02-05 | 2022-09-27 | 浙江迪谱诊断技术有限公司 | 一种检测核酸质谱设备性能的方法 |
CN113659961B (zh) * | 2021-07-19 | 2024-01-30 | 广东迈能欣科技有限公司 | 一种应用于二氧化碳传感器的滤波算法 |
CN114487073B (zh) * | 2021-12-27 | 2024-04-12 | 浙江迪谱诊断技术有限公司 | 一种飞行时间核酸质谱数据校准方法 |
CN114023379B (zh) * | 2021-12-31 | 2022-05-13 | 浙江迪谱诊断技术有限公司 | 一种确定基因型的方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190172694A1 (en) * | 2016-06-28 | 2019-06-06 | Shimadzu Corporation | Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus |
CN110196274A (zh) * | 2019-04-25 | 2019-09-03 | 上海裕达实业有限公司 | 可降低噪声的质谱装置及方法 |
CN110441253A (zh) * | 2019-07-22 | 2019-11-12 | 杭州华聚复合材料有限公司 | 一种快速检测PP-g-MAH接枝率的方法 |
CN111325121A (zh) * | 2020-02-10 | 2020-06-23 | 浙江迪谱诊断技术有限公司 | 一种核酸质谱数值处理方法 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030190644A1 (en) * | 1999-10-13 | 2003-10-09 | Andreas Braun | Methods for generating databases and databases for identifying polymorphic genetic markers |
EP1469314B1 (en) * | 2001-12-08 | 2008-08-06 | Micromass UK Limited | Method of mass spectometry |
CA2684217C (en) | 2007-04-13 | 2016-12-13 | Sequenom, Inc. | Comparative sequence analysis processes and systems |
EP2160570A4 (en) * | 2007-06-02 | 2012-12-05 | Cerno Bioscience Llc | SELF-CALIBRATION APPROACH FOR MASS SPECTROMETRY |
GB201103854D0 (en) * | 2011-03-07 | 2011-04-20 | Micromass Ltd | Dynamic resolution correction of quadrupole mass analyser |
DE102013006132B9 (de) * | 2013-04-10 | 2015-11-19 | Bruker Daltonik Gmbh | Hochdurchsatz-Charakterisierung von Proben durch Massenspektrometrie |
CN105334279B (zh) * | 2014-08-14 | 2017-08-04 | 大连达硕信息技术有限公司 | 一种高分辨质谱数据的处理方法 |
FR3035410B1 (fr) * | 2015-04-24 | 2021-10-01 | Biomerieux Sa | Procede d'identification par spectrometrie de masse d'un sous-groupe de microorganisme inconnu parmi un ensemble de sous-groupes de reference |
WO2017180652A1 (en) * | 2016-04-11 | 2017-10-19 | Applied Proteomics, Inc. | Mass spectrometric data analysis workflow |
JP7063342B2 (ja) * | 2018-02-05 | 2022-05-09 | 株式会社島津製作所 | 質量分析装置及び質量分析装置における質量較正方法 |
EP3805748A4 (en) * | 2018-05-30 | 2021-06-23 | Shimadzu Corporation | SPECTRAL DATA PROCESSING DEVICE AND ANALYSIS DEVICE |
CN109145873B (zh) * | 2018-09-27 | 2022-03-22 | 广东工业大学 | 基于遗传算法的光谱高斯峰特征提取算法 |
CN109726667B (zh) * | 2018-12-25 | 2021-03-02 | 广州市锐博生物科技有限公司 | 质谱数据处理方法和装置、计算机设备、计算机存储介质 |
CN109632860A (zh) * | 2019-01-15 | 2019-04-16 | 中国科学院昆明植物研究所 | 一种解析混合物中单体化合物结构的方法 |
CN114487072B (zh) * | 2021-12-27 | 2024-04-12 | 浙江迪谱诊断技术有限公司 | 一种飞行时间质谱峰拟合方法 |
-
2020
- 2020-02-10 CN CN202010084107.4A patent/CN111325121B/zh active Active
- 2020-12-09 WO PCT/CN2020/134810 patent/WO2021159833A1/zh unknown
- 2020-12-09 JP JP2022535644A patent/JP7456665B2/ja active Active
- 2020-12-09 EP EP20918621.2A patent/EP4016379B1/en active Active
- 2020-12-09 US US17/771,216 patent/US20220383979A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190172694A1 (en) * | 2016-06-28 | 2019-06-06 | Shimadzu Corporation | Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus |
CN110196274A (zh) * | 2019-04-25 | 2019-09-03 | 上海裕达实业有限公司 | 可降低噪声的质谱装置及方法 |
CN110441253A (zh) * | 2019-07-22 | 2019-11-12 | 杭州华聚复合材料有限公司 | 一种快速检测PP-g-MAH接枝率的方法 |
CN111325121A (zh) * | 2020-02-10 | 2020-06-23 | 浙江迪谱诊断技术有限公司 | 一种核酸质谱数值处理方法 |
Also Published As
Publication number | Publication date |
---|---|
EP4016379C0 (en) | 2024-01-31 |
CN111325121A (zh) | 2020-06-23 |
CN111325121B (zh) | 2024-02-20 |
US20220383979A1 (en) | 2022-12-01 |
JP2023515296A (ja) | 2023-04-13 |
JP7456665B2 (ja) | 2024-03-27 |
EP4016379B1 (en) | 2024-01-31 |
EP4016379A1 (en) | 2022-06-22 |
EP4016379A4 (en) | 2022-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021159833A1 (zh) | 一种核酸质谱数值处理方法 | |
JP5068541B2 (ja) | 液体クロマトグラフィ/質量分析データ中のピークを同定し、スペクトルおよびクロマトグラムを形成するための装置および方法 | |
WO2003095978A2 (en) | Methods for time-alignment of liquid chromatography-mass spectrometry data | |
CN110791565B (zh) | 一种用于ii期结直肠癌复发预测的预后标记基因及随机生存森林模型 | |
US20110282588A1 (en) | Method to automatically identify peaks and monoisotopic peaks in mass spectral data for biomolecular applications | |
CN114487073B (zh) | 一种飞行时间核酸质谱数据校准方法 | |
CN110763913B (zh) | 一种基于信号分段分类的导数谱平滑处理方法 | |
CN113588847B (zh) | 一种生物代谢组学数据处理方法、分析方法及装置和应用 | |
CN114609319B (zh) | 基于噪声估计的谱峰识别方法及系统 | |
CN111521708A (zh) | 一种玉米花生核桃黄曲霉侵染霉变的特定分子标记物及利用其进行早期霉变检测的方法 | |
CN112557332A (zh) | 一种基于光谱分峰拟合的光谱分段和光谱比对方法 | |
CN110714078A (zh) | 一种用于ii期结直肠癌复发预测的标记基因及应用 | |
CN105718723B (zh) | 一种质谱数据处理中谱峰位置检测方法 | |
Li et al. | SELDI-TOF mass spectrometry protein data | |
CN110806456A (zh) | 一种UPLC-HRMS Profile模式非靶向代谢轮廓数据自动解析的方法 | |
Antoniadis et al. | Peaks detection and alignment for mass spectrometry data | |
CN112711980A (zh) | 一种在矿物光谱特征提取中选取小波基的方法 | |
CN112730373A (zh) | 一种用于深度学习训练的拉曼光谱数据集分析方法 | |
CN114487072B (zh) | 一种飞行时间质谱峰拟合方法 | |
CN115359847A (zh) | 蛋白质组学串联质谱图寻峰算法 | |
CN109324017B (zh) | 一种提高近红外光谱分析技术建模光谱质量的方法 | |
CN111256819B (zh) | 一种光谱仪器的降噪方法 | |
CN112161966B (zh) | 一种含有荧光光谱的样本拉曼光谱的分离方法和装置 | |
CN114428139A (zh) | 代谢标志物及在制备高尿酸血症的风险预测试剂盒方面中的应用和试剂盒 | |
CN115561193A (zh) | 一种傅里叶红外光谱仪数据处理和分析系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20918621 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020918621 Country of ref document: EP Effective date: 20220317 |
|
ENP | Entry into the national phase |
Ref document number: 2022535644 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |