WO2021159833A1 - 一种核酸质谱数值处理方法 - Google Patents

一种核酸质谱数值处理方法 Download PDF

Info

Publication number
WO2021159833A1
WO2021159833A1 PCT/CN2020/134810 CN2020134810W WO2021159833A1 WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1 CN 2020134810 W CN2020134810 W CN 2020134810W WO 2021159833 A1 WO2021159833 A1 WO 2021159833A1
Authority
WO
WIPO (PCT)
Prior art keywords
peak
mass spectrum
mass
spectrum
fitting
Prior art date
Application number
PCT/CN2020/134810
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
树建伟
相双红
汪松炯
Original Assignee
浙江迪谱诊断技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江迪谱诊断技术有限公司 filed Critical 浙江迪谱诊断技术有限公司
Priority to EP20918621.2A priority Critical patent/EP4016379B1/en
Priority to US17/771,216 priority patent/US20220383979A1/en
Priority to JP2022535644A priority patent/JP7456665B2/ja
Publication of WO2021159833A1 publication Critical patent/WO2021159833A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01JELECTRIC DISCHARGE TUBES OR DISCHARGE LAMPS
    • H01J49/00Particle spectrometers or separator tubes
    • H01J49/0027Methods for using particle spectrometers
    • H01J49/0036Step by step routines describing the handling of the data generated during a measurement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • G06F2218/06Denoising by applying a scale-space analysis, e.g. using wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks

Definitions

  • the invention belongs to the technical field of nucleic acid mass spectrometry, and specifically relates to a numerical processing method for nucleic acid mass spectrometry.
  • Mass spectrometry technology has the advantages of rapidness, accuracy, and high sensitivity, and has been widely used in biological analysis in recent years.
  • nucleic acid plays a vital role in the growth, development, reproduction, heredity and mutation of organisms and other major life phenomena.
  • Modern biotechnology has discovered that most physiological or disease traits are manifested by a series of gene regulation existing in nucleic acid sequences. Therefore, for nucleic acids, accurate nucleotide detection is particularly important.
  • Mass spectrometry numerical processing is used as a nuclear It is self-evident that it is an indispensable part before glycidic acid detection. At present, there are problems such as low data collection conversion rate and uneven data in similar methods. These problems seriously affect the results of nucleotide detection, and further research is needed in this regard.
  • the purpose of the present invention is to solve the problems of low conversion rate and uneven data in the mass spectrometry data acquisition process, and to provide a nucleic acid mass spectrometry numerical processing method to extract reliable feature values before gene analysis, which is a way to improve the prior art
  • the limitation is a numerical processing method for nucleic acid mass spectrometry aimed at improving the accuracy of nucleotide detection.
  • a numerical processing method for nucleic acid mass spectrometry including the following steps:
  • Step S1 Recalibrate a single mass spectrum. For each detection point of the sample, obtain several mass spectra corresponding to different positions of the detection point. Each mass spectrum needs to use a special set of peaks with the expected mass-to-charge ratio, that is, anchor peaks. Recalibration;
  • Step S2 mass spectrum synthesis, based on step S1, combining several mass spectra corresponding to the same position of the detection point into a single mass spectrum of the detection point;
  • Step S3 Wavelet filtering, on the basis of step S2, eliminate high-frequency noise and baseline through a wavelet-based digital filter;
  • Step S4 Peak feature value extraction. On the basis of step S3, peak fitting is performed, and the peak height, peak width, peak area, mass offset, and signal-to-noise ratio are obtained based on the fitting curve of the mass spectrum.
  • the step of recalibrating a single mass spectrum includes:
  • Step S11 Candidate reference peak selection, select a set of reference peaks from all possible expected peaks according to the following criteria: first, the peak must be within the quality range of a specific interval; second, there is no adjacent reference peak within the quality range of the specific interval ;
  • Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum
  • the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
  • the total mass spectrum is decomposed into specific point intervals. For each interval, local noise is identified, and the local noise with an intensity greater than or equal to four times the local noise and greater than or equal to the global minimum is identified as a candidate peak.
  • the minimum value is preferably 0.01 *Maximum local maximum;
  • Step S13 mass spectrum peak fitting
  • Step S14 Final anchor peak selection.
  • For the detected peak list first find the cut-off SNR, that is, the minimum SNR.
  • the detected peak matches the candidate reference peak list, and only select the quality within the specific range of the candidate reference peak and the SNR is higher than the cut-off Those peaks of SNR;
  • Step S15 Recalibrate, combine the obtained anchor peaks and their expected masses, and calculate the calibration coefficients using a nonlinear fitting method.
  • the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
  • the function form is
  • step S13 the specific steps of peak fitting in step S13 include:
  • Step S131 Determine the expected line width.
  • Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
  • Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
  • Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
  • Step S135 shielding points, the points in the peak area whose SNR (SNR is calculated as the ratio of peak height to noise) greater than a given value and noise greater than a given value will be further shielded.
  • Step S136 The specific estimated line width of each peak is determined as the fitting area.
  • the Levenberg-Marquardt algorithm is used to fit a single Gaussian peak, and the specified parameters are found to minimize the tuning function.
  • the specific steps of determining the expected line width include:
  • ⁇ e L A +L B ⁇ M, where L A and L B are default parameters, and M is a given peak value (Da).
  • the specific steps of the tuning function include:
  • the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval
  • H f is the fitting height above the baseline corresponding to the point M f
  • the parameters M f and ⁇ f indicate the fitting quality
  • the combined line width, ⁇ i is given as a certain parameter according to the conditions.
  • the specific step of extracting feature values in the step S4 includes:
  • Step S41 peak fitting, the same as step S13;
  • Step S42 Record the following characteristics:
  • V A/SNR, area variance
  • the nucleic acid mass spectrometry numerical processing method of the present invention extracts reliable feature values before gene analysis. It is a nucleic acid mass spectrometry numerical processing aiming at improving the limitations of the prior art and improving the accuracy of nucleotide detection. Method; 2. Improve the credibility of nucleic acid mass spectrometry data collection.
  • Figure 3 Comparison chart before and after peak fitting.
  • the present invention is a numerical processing method for nucleic acid mass spectrometry, which includes the following steps:
  • Step S1 Recalibration of a single mass spectrum.
  • the mass spectrometer needs to be recalibrated.
  • the recalibration process is achieved by matching a set of specially identified peaks called anchor peaks to their expected masses, and follows the following steps:
  • Step S11 Selection of candidate reference peaks.
  • a set of clean reference peaks is selected from all possible expected peaks. The criteria are as follows:
  • the peak value must be within the mass range of 4000Da and 9000Da.
  • the peak has no adjacent reference peaks within the mass range defined by mass +/- resolution.
  • Step S12 Peak location, a weight matrix convolution filter with a width of 9 is applied to the mass spectrum, the matrix is preferably: (-4,0,1,2,2,2,1,0,-4), for the mass spectrum At a fixed point, the intensity value after applying this filtering is equal to the weighted sum of the surrounding 9 values, expressed by the following formula:
  • the peaks with adjacent candidate peaks in a certain range, SNR (the ratio of the filtered intensity value to the local noise) ⁇ 2, and the quality value outside the pre-specified candidate reference peak range will be cleared. Finally, the peak index is adjusted based on the original intensity. Refer to Figure 1 and Figure 2 for the mass spectra before and after applying the filter.
  • Step S13 mass spectrum peak fitting, see Figure 3, the specific implementation steps are as follows:
  • Step S131 Determine the expected line width, which is determined by the following formula:
  • ⁇ e L A +L B ⁇ M, where L A and L B are default parameters (the default values are 2.5 and 0.0005 respectively), and M is the given peak value (Da).
  • Step S132 The area of the expected signal is shielded within the interval of NN expected line widths, and NN is preferably 4.
  • Step S133 The implicit baseline is calculated as the average value of the mass spectrum intensity y i in the MM ⁇ m interval, where ⁇ m is the smallest estimated line width in this interval, and MM is preferably 80. In the shielding area of this interval, linear Interpolation provides the value of y i.
  • Step S134 Calculate the noise level as the running effective value (RMS) of (signal-baseline).
  • Step S135 The points in the peak area where the SNR (SNR is calculated as the ratio of peak height to noise) greater than 5 and noise greater than 1 will be further shielded.
  • SNR SNR is calculated as the ratio of peak height to noise
  • Step S136 The 4 estimated line widths of each peak are determined as the fitting area.
  • the sum is the sum of all ⁇ y i ,m i ⁇ from the specified interval, and H f is the fitting height above the baseline corresponding to the point M f.
  • ⁇ i is set equal to 1
  • ⁇ i is set to 0.2 or 0.4.
  • Step S14 Final final anchor peak selection.
  • the detected peak list first find the cut-off SNR (ie the minimum SNR), the detected peak matches the candidate reference peak list, and only select the quality within +/-25Da of the candidate reference peak And the SNR is higher than those peaks of the cut-off SNR.
  • the cut-off SNR ie the minimum SNR
  • Step S15 Recalibrate, combine the obtained anchored peaks and their expected masses, and calculate the calibration coefficients using the method of nonlinear fitting.
  • the mapping function between the mass spectrometer and m/z (mass-to-charge ratio) is the Bruker function .
  • the function form is
  • Step S2 mass spectrum synthesis, based on step S1, a number of mass spectra at different positions of the corresponding points are summarized into a unique mass spectrum of the detection point.
  • the method of synthesizing multiple mass spectra is "self-weighted average", which can be described by the following equation:
  • n is the number of mass spectra
  • I ij the intensity of mass i from the jth mass spectrum.
  • the best mass spectrum with the most anchored peaks is selected from the mass spectra. Initialize the added spectrum with the best spectrum. Only when the calibration coefficient and the calibration coefficient of the best spectrum meet the conditions (A should vary within 1%; B should vary within 10%; C should vary within 20Da), the absolute intensity or square of the mass spectrum and another mass spectrum Intensity summation.
  • Step S3 Wavelet filtering. Wavelet-based filtering is completed on the synthesized mass spectrum to eliminate high-frequency noise and baseline. Then another round of recalibration is performed on the filtered mass spectrum. After this round of recalibration, assign the new ABC coefficients to the synthesized mass spectrum and adjust the m/z value accordingly.
  • Step S4 Peak feature value extraction, see Figure 3.
  • the fitting process follows the following steps:
  • Step S41 Peak fitting, the step is the same as S13;
  • Step S42 Record the following characteristics after fitting successfully:
  • V A/SNR, area variance
  • the nucleic acid mass spectrum numerical processing method of the present invention extracts reliable characteristic values before gene analysis, and is a nucleic acid mass spectrum numerical value aimed at improving the limitations of the prior art and increasing the accuracy of nucleotide detection; and improving the possibility of nucleic acid mass spectrometry data collection Reliability.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
PCT/CN2020/134810 2020-02-10 2020-12-09 一种核酸质谱数值处理方法 WO2021159833A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20918621.2A EP4016379B1 (en) 2020-02-10 2020-12-09 Nucleic acid mass spectrum numerical processing method
US17/771,216 US20220383979A1 (en) 2020-02-10 2020-12-09 Nucleic acid mass spectrum numerical processing method
JP2022535644A JP7456665B2 (ja) 2020-02-10 2020-12-09 核酸マススペクトル数値処理方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010084107.4 2020-02-10
CN202010084107.4A CN111325121B (zh) 2020-02-10 2020-02-10 一种核酸质谱数值处理方法

Publications (1)

Publication Number Publication Date
WO2021159833A1 true WO2021159833A1 (zh) 2021-08-19

Family

ID=71172661

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134810 WO2021159833A1 (zh) 2020-02-10 2020-12-09 一种核酸质谱数值处理方法

Country Status (5)

Country Link
US (1) US20220383979A1 (ja)
EP (1) EP4016379B1 (ja)
JP (1) JP7456665B2 (ja)
CN (1) CN111325121B (ja)
WO (1) WO2021159833A1 (ja)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325121B (zh) * 2020-02-10 2024-02-20 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法
CN112444556B (zh) * 2020-09-27 2021-12-03 浙江迪谱诊断技术有限公司 一种飞行时间核酸质谱参数确定方法
CN112418072A (zh) * 2020-11-20 2021-02-26 上海交通大学 数据处理方法、装置、计算机设备和存储介质
CN112378986B (zh) * 2021-01-18 2021-08-03 宁波华仪宁创智能科技有限公司 质谱分析方法
CN112877412B (zh) * 2021-02-05 2022-09-27 浙江迪谱诊断技术有限公司 一种检测核酸质谱设备性能的方法
CN113659961B (zh) * 2021-07-19 2024-01-30 广东迈能欣科技有限公司 一种应用于二氧化碳传感器的滤波算法
CN114487073B (zh) * 2021-12-27 2024-04-12 浙江迪谱诊断技术有限公司 一种飞行时间核酸质谱数据校准方法
CN114023379B (zh) * 2021-12-31 2022-05-13 浙江迪谱诊断技术有限公司 一种确定基因型的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172694A1 (en) * 2016-06-28 2019-06-06 Shimadzu Corporation Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus
CN110196274A (zh) * 2019-04-25 2019-09-03 上海裕达实业有限公司 可降低噪声的质谱装置及方法
CN110441253A (zh) * 2019-07-22 2019-11-12 杭州华聚复合材料有限公司 一种快速检测PP-g-MAH接枝率的方法
CN111325121A (zh) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030190644A1 (en) * 1999-10-13 2003-10-09 Andreas Braun Methods for generating databases and databases for identifying polymorphic genetic markers
EP1469314B1 (en) * 2001-12-08 2008-08-06 Micromass UK Limited Method of mass spectometry
CA2684217C (en) * 2007-04-13 2016-12-13 Sequenom, Inc. Comparative sequence analysis processes and systems
EP2160570A4 (en) * 2007-06-02 2012-12-05 Cerno Bioscience Llc SELF-CALIBRATION APPROACH FOR MASS SPECTROMETRY
GB201103854D0 (en) * 2011-03-07 2011-04-20 Micromass Ltd Dynamic resolution correction of quadrupole mass analyser
DE102013006132B9 (de) * 2013-04-10 2015-11-19 Bruker Daltonik Gmbh Hochdurchsatz-Charakterisierung von Proben durch Massenspektrometrie
CN105334279B (zh) * 2014-08-14 2017-08-04 大连达硕信息技术有限公司 一种高分辨质谱数据的处理方法
FR3035410B1 (fr) * 2015-04-24 2021-10-01 Biomerieux Sa Procede d'identification par spectrometrie de masse d'un sous-groupe de microorganisme inconnu parmi un ensemble de sous-groupes de reference
US20190130994A1 (en) * 2016-04-11 2019-05-02 Discerndx, Inc. Mass Spectrometric Data Analysis Workflow
US11798795B2 (en) * 2018-02-05 2023-10-24 Shimadzu Corporation Mass spectrometer and mass calibration method in mass spectrometer
US11289316B2 (en) * 2018-05-30 2022-03-29 Shimadzu Corporation Spectrum data processing device and analyzer
CN109145873B (zh) * 2018-09-27 2022-03-22 广东工业大学 基于遗传算法的光谱高斯峰特征提取算法
CN109726667B (zh) * 2018-12-25 2021-03-02 广州市锐博生物科技有限公司 质谱数据处理方法和装置、计算机设备、计算机存储介质
CN109632860A (zh) * 2019-01-15 2019-04-16 中国科学院昆明植物研究所 一种解析混合物中单体化合物结构的方法
CN114487072B (zh) * 2021-12-27 2024-04-12 浙江迪谱诊断技术有限公司 一种飞行时间质谱峰拟合方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172694A1 (en) * 2016-06-28 2019-06-06 Shimadzu Corporation Signal processing method and system based on time-of-flight mass spectrometry and electronic apparatus
CN110196274A (zh) * 2019-04-25 2019-09-03 上海裕达实业有限公司 可降低噪声的质谱装置及方法
CN110441253A (zh) * 2019-07-22 2019-11-12 杭州华聚复合材料有限公司 一种快速检测PP-g-MAH接枝率的方法
CN111325121A (zh) * 2020-02-10 2020-06-23 浙江迪谱诊断技术有限公司 一种核酸质谱数值处理方法

Also Published As

Publication number Publication date
CN111325121B (zh) 2024-02-20
JP7456665B2 (ja) 2024-03-27
EP4016379A1 (en) 2022-06-22
EP4016379A4 (en) 2022-12-14
EP4016379C0 (en) 2024-01-31
CN111325121A (zh) 2020-06-23
JP2023515296A (ja) 2023-04-13
US20220383979A1 (en) 2022-12-01
EP4016379B1 (en) 2024-01-31

Similar Documents

Publication Publication Date Title
WO2021159833A1 (zh) 一种核酸质谱数值处理方法
JP5068541B2 (ja) 液体クロマトグラフィ/質量分析データ中のピークを同定し、スペクトルおよびクロマトグラムを形成するための装置および方法
CN109883547B (zh) 一种基于小波阈值差分的宽波段光谱信号去噪方法
WO2003095978A2 (en) Methods for time-alignment of liquid chromatography-mass spectrometry data
CN110791565B (zh) 一种用于ii期结直肠癌复发预测的预后标记基因及随机生存森林模型
US20110282588A1 (en) Method to automatically identify peaks and monoisotopic peaks in mass spectral data for biomolecular applications
EP1782621A1 (en) Automatic background removal for input data
CN110763913B (zh) 一种基于信号分段分类的导数谱平滑处理方法
CN114487073A (zh) 一种飞行时间核酸质谱数据校准方法
CN110714078B (zh) 一种用于ii期结直肠癌复发预测的标记基因及应用
CN113588847B (zh) 一种生物代谢组学数据处理方法、分析方法及装置和应用
CN114609319B (zh) 基于噪声估计的谱峰识别方法及系统
CN111521708A (zh) 一种玉米花生核桃黄曲霉侵染霉变的特定分子标记物及利用其进行早期霉变检测的方法
CN112557332A (zh) 一种基于光谱分峰拟合的光谱分段和光谱比对方法
CN105718723B (zh) 一种质谱数据处理中谱峰位置检测方法
Li et al. SELDI-TOF mass spectrometry protein data
CN110806456A (zh) 一种UPLC-HRMS Profile模式非靶向代谢轮廓数据自动解析的方法
Antoniadis et al. Peaks detection and alignment for mass spectrometry data
GB2607200A (en) Mass spectrometric determination of particular tissue states
CN112711980A (zh) 一种在矿物光谱特征提取中选取小波基的方法
CN115561193A (zh) 一种傅里叶红外光谱仪数据处理和分析系统
CN112730373A (zh) 一种用于深度学习训练的拉曼光谱数据集分析方法
CN115359847A (zh) 蛋白质组学串联质谱图寻峰算法
CN109324017B (zh) 一种提高近红外光谱分析技术建模光谱质量的方法
CN111256819B (zh) 一种光谱仪器的降噪方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20918621

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020918621

Country of ref document: EP

Effective date: 20220317

ENP Entry into the national phase

Ref document number: 2022535644

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE