CN111650271B - Identification method and application of soil organic matter marker - Google Patents

Identification method and application of soil organic matter marker Download PDF

Info

Publication number
CN111650271B
CN111650271B CN202010581862.3A CN202010581862A CN111650271B CN 111650271 B CN111650271 B CN 111650271B CN 202010581862 A CN202010581862 A CN 202010581862A CN 111650271 B CN111650271 B CN 111650271B
Authority
CN
China
Prior art keywords
soil
peak
peaks
organic matter
mass spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010581862.3A
Other languages
Chinese (zh)
Other versions
CN111650271A (en
Inventor
孔傲
瞿晓磊
高涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Nanjing University of Finance and Economics
Original Assignee
Nanjing University
Nanjing University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University, Nanjing University of Finance and Economics filed Critical Nanjing University
Priority to CN202010581862.3A priority Critical patent/CN111650271B/en
Publication of CN111650271A publication Critical patent/CN111650271A/en
Application granted granted Critical
Publication of CN111650271B publication Critical patent/CN111650271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode

Landscapes

  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention discloses a method for identifying organic matter markers of soil and application thereof, which comprises the steps of analyzing samples of different types of soil by a mass spectrometer, obtaining a group of mass spectrum data samples, carrying out pretreatment on each mass spectrum sample, identifying peaks on the soil mass spectrum sample, aligning peaks appearing on different samples in approximate retention time, and obtaining a set of non-repetitive peaks; calculating the occurrence frequency of each peak or each pair of peaks in different types of soil samples, and screening out the peak or the pair of peaks with higher occurrence frequency in each type of soil samples; binary coding is carried out on samples of different types of soil based on the retention time of the screened peaks or peak pairs, the marked peaks and peak pairs of the different types of soil are calculated through a simulated annealing algorithm, and organic matter molecular composition corresponding to each peak is determined according to the retention time of the peaks, so that organic matter markers or organic matter marker pairs of the different types of soil are obtained.

Description

Identification method and application of soil organic matter marker
Technical Field
The invention belongs to the field of soil detection, and particularly relates to a method for identifying a soil organic matter marker and application thereof.
Background
Soil is the largest carbon reservoir on earth, and Soil Organic Matter (SOM) is an important component of global carbon cycle, and its formation, transformation and degradation processes control the variation of the Soil carbon reservoir and the emission flux of greenhouse gases. SOM has complex structure and rich sources, and plant, animal and microorganism residues are important input sources. SOM is also involved in the physical, chemical and biological processes of aquatic and terrestrial ecosystems. Structural feature determination for SOMIts turnover cycle in soil, with soil CO 2 Is closely related to the emission of (c). Therefore, exploring the molecular structural characteristics of the SOM is a key step for recognizing the biogeochemical process of the SOM and is a key link for analyzing the feedback mechanism of the soil carbon reservoir on global climate change. The structure of the SOM is complex, and common analysis methods such as ultraviolet-visible spectroscopy (UV-VIS), fluorescence spectroscopy (EM), nuclear Magnetic Resonance (NMR) and the like can only provide macroscopic information of the SOM or estimate partial chemical information of the SOM, so that molecular level structure analysis cannot be realized.
Mass Spectrometers (MS) are a widely used class of chemical analysis instruments used in a variety of fields. Mass spectrometry can be classified into gas-mass spectrometry (GC-MS), liquid-mass spectrometry (LC-MS), time-of-flight mass spectrometry (TOF), and the like, according to principles. The mass spectrometry method has strong specificity and high sensitivity, and can provide abundant molecular structure information in one-time analysis. The combination of separation techniques with mass spectrometry allows the qualitative and quantitative detection of a large number of molecular compounds in a mixture. The SOM is regarded as a mixture of a large number of small molecular compounds, and molecular structure information in the SOM can be obtained at high flux by using a mass spectrometry method. By identifying the SOM molecular species and identifying the abundance of different molecular compounds, the quantitative analysis of the SOM molecular structure can be realized. However, due to the heterogeneity and complexity of the SOM structure, its mass spectrum peaks are exceptionally complex, baseline drift is large, noise is significant, and the number of peaks is extremely large, usually hundreds to thousands. With the development of mass spectrometry technology, the number of peaks of SOM mass spectrometry still gradually increases. The throughput of molecular recognition and quantification of each peak is too low to be practical. Therefore, it is desirable to screen all detected molecules for molecular markers that reflect the source, structure and chemical stage of SOM. The molecular markers can help us to know the characteristics of the SOM more deeply, and provide data support for the research on the processes of SOM source, transformation and the like. At present, a pretreatment technology suitable for an SOM mass spectrogram and a system molecular marker mining method based on SOM high-throughput mass spectrum data are still lacked.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem that in order to overcome the defects of the prior art, the organic matter molecular marker of the soil is identified based on high-throughput mass spectrum data to obtain the organic matter markers corresponding to different types of soil, so that the organic matter molecular markers can be used for identifying different types of soil.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a method for identifying organic matter markers of soil comprises the following steps:
(1) Analyzing samples of different types of soil by using a mass spectrometer, acquiring a group of mass spectrum data samples, preprocessing each mass spectrum sample, identifying peaks on the soil mass spectrum sample, aligning peaks appearing on different samples in approximate retention time, and thus obtaining a set of non-repetitive peaks;
(2) Calculating the occurrence frequency of each peak in different types of soil samples and the frequency of the simultaneous occurrence of a peak pair formed by every two peaks in different types of soil samples, and screening out a plurality of peaks or peak pairs with the highest occurrence frequency in each type of soil samples;
(3) And (3) carrying out binary coding on samples of different types of soil based on the retention time of the peaks or the peak pairs screened in the step (2), calculating to obtain marked peaks and peak pairs of different types of soil through a simulated annealing algorithm, and determining the organic matter molecule composition corresponding to each peak according to the retention time of the peaks to obtain organic matter markers or organic matter marker pairs of different types of soil.
In the step (1), the samples of different types of soil at least comprise cultivated land soil samples, forest land soil samples and construction land soil samples.
In the step (1), the method for acquiring the mass spectrum data sample comprises the following steps: removing impurities in the collected soil sample, drying the soil sample in a cool and ventilated place, and screening the soil sample through a sieve with the aperture of 2mm to obtain an experimental sample for later use; weighing 5.0 +/-0.1 mg of sample in a cracking cup, cracking for 0.2min at the temperature of 610 ℃, and then carrying out GC-MS analysis and detection.
In the step (1), the pretreatment method comprises the following steps: normalizing the mass spectrum by dividing the intensity at each retention time on the mass spectrum by the sum of the intensities at all retention times of the mass spectrum; smoothing the mass spectrum based on a fixed smoothing window (size 8-12 times the dwell time); calculating the difference between the normalized mass spectrum intensity and the smoothed mass spectrum intensity, and taking the median of the mass spectrum intensity as the noise volume of the intensity at each retention time in a fixed noise window (the size is about 140-180 times of the retention time) taking each retention time as the center; within a fixed reference window (the size is 300-500 times of the residence time) taking each retention time as the center, subtracting the minimum value of the smoothed mass spectrum intensity from the mass spectrum intensity corresponding to the retention time by taking the minimum value as a baseline, thereby obtaining a mass spectrum with the baseline removed. The residence time is the residence time of each ion scanned at the time of mass spectral data acquisition.
In step (1), the method for aligning peaks appearing at approximate retention times on different samples comprises: defining the left slope and the right slope of each peak as
Figure BDA0002552593850000031
And
Figure BDA0002552593850000032
wherein I is the intensity of the peak, a is the retention time of the peak, a L 、I L The left-end retention time and corresponding intensity, a, of a peak-finding window (size 60-100 times the dwell time) centered on the peak, respectively R 、I R Respectively the right-hand retention time and the corresponding intensity. Sequentially testing the ratio of the mass spectrum intensity without the base line to the noise volume at each retention time, and when the ratio is greater than a specific threshold (the real number is within 2-5), and the left slope and the right slope are both greater than the specific threshold (the real number is within 0.01-0.03), determining that a peak appears at the position;
and if the difference of the retention time of the two peaks on different mass spectra is smaller than the size of the peak searching window, considering that the retention time of the two peaks is the same, and finely adjusting the retention time of the two peaks to the mean value of the two peaks, thereby finally forming a list of all the peaks, so that the difference of the retention time of every two peaks is larger than the peak searching window.
In the step (2), N important peaks with the highest frequency of occurrence (N is an integer within 20-50) and M important peak pairs with the highest frequency of occurrence (M is an integer within 3-10) are selected from each type of soil sample.
In the step (3), the binary coding is to code each mass spectrum sample into a vector with a length of N + M, each element of the vector corresponds to an important peak or peak pair, the value of the element is 1 or 0, and whether the corresponding peak or peak pair appears in the mass spectrum sample or not is determined.
In the step (3), the simulated annealing algorithm takes all possible values of a binary vector X with the length of N + M as a solution space, and takes the accuracy of identifying a certain soil type from all soils by X as an objective function U. The identification method is that given X, for a certain sample, the peak corresponding to X and the ratio of the peak to the peak appearing on the sample are calculated, and if the ratio is more than 2/3, the sample is considered to belong to the soil type. Searching an optimal solution by maximizing an objective function, and the method comprises the following specific steps:
(a) Initializing an initial solution X 0
(b) At the k step of the iteration, a new solution is generated
Figure BDA0002552593850000033
Then compare
Figure BDA0002552593850000034
And U (X) k-1 ) The size of (2):
if it is
Figure BDA0002552593850000035
The new solution is taken as the current solution, i.e. the order
Figure BDA0002552593850000036
If it is
Figure BDA0002552593850000037
By probability
Figure BDA0002552593850000038
Will solve the new solution
Figure BDA0002552593850000039
As the current solution;
(c) Looping step (b) until an algorithm termination condition is reached;
setting the iteration number within 200-500 of the length of the vector X, T k Is set to r k R is a real number smaller than 1, and the convergence speed of the algorithm is controlled; sequence after algorithm termination { X k The solution that maximizes the value of U (X) is the optimal solution. And the peak pair corresponding to the optimal solution are the symbolic peak and the peak pair of the soil type.
Simulated annealing (simulated annealing) is a heuristic stochastic optimization algorithm that is commonly used to search for the global maximum of the objective equation U (X) defined over a large discrete solution space Ω. The simulated annealing algorithm simulates the physical annealing process of a solid, wherein a solution X E omega of the optimization problem and the objective function U (X): omega → R correspond to a microscopic state of the solid and its energy in this state, respectively. Control parameter T of algorithm k The decreasing temperature in the solid anneal is simulated for a predetermined sequence of decreasing parameters. Simulated annealing algorithm from an initial solution X 0 E omega and initial value T of control parameter 0 Starting, repeating the iteration steps of 'generating a new solution, calculating the difference between the new solution and the current solution of the objective function, and accepting or abandoning the new solution', and jumping out from the local optimal solution by combining with the probability jump characteristic, thereby finally tending to the global optimal solution.
And (4) utilizing mass spectrometer Xcalibr software to search in an NIST database and determining the organic matter molecule composition corresponding to each peak.
The invention further claims the application of the organic matter marker identified by the method in identifying different types of soil.
Specifically, the application method comprises the steps of firstly obtaining organic matter markers or organic matter marker pairs of different types of soil by the identification method; and then, by calculating the number of the organic matter markers and the organic matter marker pairs appearing in the unknown soil sample, if the total number of the organic matter markers and the organic matter marker pairs appearing in the unknown soil sample of a certain category is higher than the number of the organic matter markers or the organic matter marker pairs appearing in other categories, the soil sample of the category of the unknown sample is judged.
Has the advantages that:
1. the invention innovatively integrates a high-throughput mass spectrum data channel and a data mining method, is applied to mining of soil organic matter molecular markers, and thus can identify the category and the characteristic of soil organic matter. A universal organic molecular marker mining method is established.
2. Based on morphological characteristics of mass spectrum data peaks of soil organic matters, the method remarkably reduces the possibility of identifying pseudo peaks by setting an innovative standard of peak slope on the basis of the existing signal-to-noise ratio peak identification method.
3. In order to more accurately depict and distinguish the organic matter property of the soil, the invention innovatively provides a concept of 'marker pair' in consideration of the situation that two or more molecular markers often appear or do not appear simultaneously, and improves the accuracy of soil sample classification.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of the technical solution of the present invention.
FIG. 2 is a mass spectrum of 30 soil samples in example.
FIG. 3 is a peak identified after pretreatment of the mass spectrum in the first row and the first column of FIG. 1.
FIG. 4 shows peaks identified and aligned in the mass spectrum of 30 soil samples of the example.
FIG. 5 is a diagram illustrating the optimization process of the objective function in identifying the landmark peak and the peak-to-time for the cultivated land by the simulated annealing algorithm in the embodiment.
Detailed Description
The invention will be better understood from the following examples.
In the embodiment, 30 soil samples are selected, wherein 11 soil samples are from cultivated land, 10 soil samples are from forest land, 9 soil samples are from construction land, the upper limit of the number of the markers in each group is 40, and the upper limit of the number of the 'marker pairs' in each group is 5. The technical flow chart shown in fig. 1 is adopted to identify organic matter markers for 30 soil samples, and the results obtained by identification are used to perform characteristic analysis on different types of soil and judge and classify unknown types of soil samples. The method comprises the following specific steps:
(1) The method comprises the steps of removing impurities such as plant roots, broken stones and the like in a collected soil sample, drying the soil sample in a shady and cool ventilation place, and screening the soil sample with a sieve with the aperture of 2mm to obtain an experimental sample for later use. 5.0. + -. 0.1mg of sample was weighed into the lysis cup. Cracking at 610 deg.c for 0.2min before GC-MS analysis. GC temperature program: the temperature was maintained at 35 ℃ for 5min, the temperature was raised to 200 ℃ at 2.5 ℃/min, then to 270 ℃ at 5 ℃/min, and the temperature was maintained for 5min. MS conditions: the scanning range is 35-600m/z, the scanning time is 0.2s, and the ionization energy is 70eV. The injection port temperature, the transmission line temperature, and the ion source temperature were 250 ℃,280 ℃, and 300 ℃, respectively. The carrier gas was He, and the flow rate was 1ml/min. Thus, a mass spectrum of the sample is obtained, as shown in fig. 2, wherein the first row, the 1 st, the 2 nd, and the 5 th, the second row, the fourth row, the 1 st and the 3 rd, the fifth row, the 1 st and the 3 rd, and the sixth row, the 2 nd, the 3 rd, and the 4 th are arable soil; the 3 rd in the first row, the 1 st, 2 nd and 4 th in the second row, the 1 st, 2 nd and 5 th in the third row, and the 2 nd, 1 st and 5 th in the fourth row are forest land soil; the rest is the soil for building land.
(2) Normalizing the mass spectrum by dividing the intensity at each retention time on the mass spectrum by the sum of the intensities at all retention times of the mass spectrum; smoothing the mass spectrum based on a smoothing window of 10 times the dwell time; calculating the difference between the normalized mass spectrum intensity and the smoothed mass spectrum intensity, and taking the median of the mass spectrum intensity as the noise volume of the intensity on each retention time in a noise window which takes each retention time as the center and is 160 times of the residence time; within a reference window of 400 residence times centered on each retention time, subtracting the minimum value of the smoothed mass spectrum intensity as a baseline from the mass spectrum intensity corresponding to the retention time, thereby obtaining a baseline-removed mass spectrum. Taking the mass spectrum of the soil in the first row 1 of the farmland in FIG. 2 as an example, the peaks identified by the above pretreatment are shown in FIG. 3.
(3) Defining the left slope and the right slope of each peak as
Figure BDA0002552593850000061
And
Figure BDA0002552593850000062
wherein I is the intensity of the peak, a is the retention time of the peak, a L 、I L The left-end retention time and corresponding intensity, a, of a peak-finding window (size 60-100 times the dwell time) centered on the peak, respectively R 、I R Respectively the right-hand retention time and the corresponding intensity. Sequentially checking the ratio of the mass spectrum intensity without the base line to the noise volume at each retention time, and when the ratio is greater than a threshold value 3 and the left slope and the right slope are both greater than a threshold value 0.02, determining that a peak appears at the position;
if the difference of the retention time of two peaks on different mass spectra is less than 80 times the size of the residence time (peak searching window), the retention time of the two peaks is considered to be the same, and the retention time of the two peaks is finely adjusted to the average value of the two peaks, so that a list of all the peaks is finally formed, and the difference of the retention time of every two peaks is greater than 80 residence times. Peaks were obtained after identification and alignment on 30 mass spectra as shown in figure 4.
(4) And calculating the frequency of each peak in the peak list in each soil category and the frequency of each two peaks in each soil category at the same time, and selecting 40 important peaks and 5 important peak pairs with the highest frequency in the soil category for each soil category.
(5) Binary coding is carried out on samples of different types of soil based on the retention time of the peaks or the peak pairs screened in the step (2), each mass spectrum sample is coded into a vector with the length of 45, each element of the vector corresponds to an important peak or the peak pair, the value of the element is 1 or 0, and whether the corresponding peak or the peak pair appears in the mass spectrum sample or not is judged.
Taking all values of binary vector X with length of 45 as solution space toThe accuracy rate of identifying a certain soil type from all soils by X is an objective function U. The identification method is that given X, for a certain sample, the peak corresponding to X and the ratio of the peak to the peak appearing on the sample are calculated, and if the ratio is more than 2/3, the sample is considered to belong to the soil type. Searching for an optimal solution, T, by maximizing an objective function through a simulated annealing algorithm k Is set to 0.95 k The iteration times are set to 300 times of the length of the vector X, namely 1350 times, and then the symbolic peak and the peak pair of a certain type of soil are obtained. The simulated annealing algorithm is implemented respectively for cultivated land, forest land and construction land, and then the symbolic peaks and peak pairs of the three types of soil are obtained, as shown in table 1. Taking the identification process of the symbolic peak and the peak pair of the cultivated land as an example, the optimization process of the simulated annealing algorithm is shown in fig. 5.
TABLE 1
Figure BDA0002552593850000071
(6) According to the retention time of the peaks, a mass spectrometer Xcalibr software is utilized to search in an NIST database, and the organic matter molecule composition corresponding to each peak is determined, so that the organic matter markers or organic matter marker pairs of the three types of soil are obtained.
(7) And obtaining a mass spectrogram of an unknown soil sample, calculating the number of the organic matter markers and the organic matter marker pairs appearing in the unknown soil sample, and if the total number of the organic matter markers and the organic matter marker pairs appearing in the unknown soil sample of a certain category of soil is higher than the number of the organic matter markers or the organic matter marker pairs appearing in other categories of soil, judging that the unknown sample is the soil sample of the category. Taking the 30 samples as an example, the prediction results are shown in table 2, and the prediction accuracy is 86.67%.
TABLE 2
Figure BDA0002552593850000081
Figure BDA0002552593850000091
The present invention provides a method for identifying organic markers of soil, and a method and a system for applying the same, and a plurality of methods and ways for implementing the technical solution are provided, the above description is only a preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (7)

1. The method for identifying the organic matter marker of the soil is characterized by comprising the following steps of:
(1) Analyzing samples of different types of soil by a mass spectrometer to obtain a group of mass spectrum data samples, preprocessing each mass spectrum sample, identifying peaks on the soil mass spectrum sample, aligning the peaks appearing on different samples in approximate retention time, and thus obtaining a set of non-repetitive peaks;
(2) Calculating the occurrence frequency of each peak in different types of soil samples and the frequency of the simultaneous occurrence of a peak pair formed by every two peaks in different types of soil samples, and screening out a plurality of peaks or peak pairs with the highest occurrence frequency in each type of soil samples;
(3) Binary coding is carried out on samples of different types of soil based on the retention time of the peaks or the peak pairs screened out in the step (2), the marked peaks and the peak pairs of the different types of soil are calculated through a simulated annealing algorithm, and the organic matter molecule composition corresponding to each peak is determined according to the retention time of the peaks, so that the organic matter markers or the organic matter marker pairs of the different types of soil are obtained;
in the step (2), N peaks with the highest occurrence frequency and M peak pairs with the highest occurrence frequency are selected from each type of soil sample; wherein, the value of N is an integer within 20-50, and the value of M is an integer within 3-10;
in the step (3), the binary coding is to code each mass spectrum sample into a vector with the length of N + M, each element of the vector corresponds to an important peak or peak pair, the value of the element is 1 or 0, and whether the corresponding peak or peak pair appears in the mass spectrum sample or not is determined;
in the step (3), the simulated annealing algorithm takes all possible values of a binary vector X with the length of N + M as a solution space, and the accuracy of identifying a certain soil type by the X is taken as a target function U; the identification method is given X, and for a certain sample, the peak corresponding to the X and the ratio of the peak to the appearance of the sample are calculated, and if the ratio is more than 2/3, the sample is considered to belong to the soil type; searching an optimal solution by maximizing an objective function, and the method comprises the following specific steps:
(a) Initializing an initial solution X 0
(b) At the k step of the iteration, a new solution is generated
Figure FDA0003784338720000011
Then compare
Figure FDA0003784338720000012
And U (X) k-1 ) The size of (2):
if it is
Figure FDA0003784338720000013
The new solution is taken as the current solution, i.e. the order
Figure FDA0003784338720000014
If it is
Figure FDA0003784338720000015
By probability
Figure FDA0003784338720000016
Will solve the new solution
Figure FDA0003784338720000017
As the current solution;
(c) Looping step (b) until an algorithm termination condition is reached;
setting the iteration number within 200-500 times of the length of the vector X, and T k Is set to r k R is a real number smaller than 1, and the convergence rate of the algorithm is controlled; sequence after algorithm termination { X k And the maximum value of U (X) is the optimal solution, and the peak pair corresponding to the optimal solution are the symbolic peak and the peak pair of the soil type.
2. The method for identifying a soil organic matter marker according to claim 1, wherein in the step (1), the samples of different types of soil at least comprise cultivated land soil samples, forest land soil samples and construction land soil samples;
the method for acquiring the mass spectrum data sample comprises the following steps: removing impurities from the collected soil sample, drying the soil sample in a cool and ventilated place, and then sieving the soil sample by a sieve with the aperture of 2mm to obtain an experimental sample for later use; weighing 5.0 +/-0.1 mg of sample in a cracking cup, cracking for 0.2min at the temperature of 610 ℃, and then entering GC-MS analysis and detection.
3. The method for identifying a soil organic matter marker according to claim 1, wherein in the step (1), the pretreatment method comprises: normalizing the mass spectrum by dividing the intensity at each retention time on the mass spectrum by the sum of the intensities at all retention times of the mass spectrum; smoothing the mass spectrum based on a fixed smoothing window; calculating the difference between the normalized mass spectrum intensity and the smoothed mass spectrum intensity, and taking the median of the mass spectrum intensity as the noise volume of the intensity at each retention time in a fixed noise window taking each retention time as the center; within a fixed reference window centered on each retention time, subtracting the minimum value of the smoothed mass spectrum intensity from the mass spectrum intensity corresponding to the retention time as a baseline, thereby obtaining a baseline-removed mass spectrum;
wherein, the value of the smoothing window is 8 to 12 times of the residence time;
the value of the fixed noise window is 140-180 times of the residence time;
the value size of the fixed reference window is 300-500 times of the residence time;
the residence time is the residence time of each ion scanned at the time of mass spectral data acquisition.
4. The method for identifying soil organic matter markers according to claim 3, wherein in the step (1), the peaks appearing at approximate retention times on different samples are aligned by: defining the left slope and the right slope of each peak as
Figure FDA0003784338720000021
And
Figure FDA0003784338720000022
wherein I is the intensity of the peak, a is the retention time of the peak, a L 、I L The left-end retention time and corresponding intensity, a, of the peak-finding window centered on the peak, respectively R 、I R Respectively the right-hand retention time and the corresponding intensity; sequentially checking the ratio of the mass spectrum intensity without the base line to the noise volume at each retention time, and when the ratio is greater than a signal-to-noise threshold and both the left slope and the right slope are greater than a slope threshold, determining that a peak appears at the position; wherein, the value size of the peak searching window is 60 to 100 times of the residence time; the value of the signal-to-noise ratio threshold is a real number within 2-5; the slope threshold value is a real number within 0.01-0.03;
and if the difference of the retention time of the two peaks on different mass spectra is smaller than the size of the peak searching window, considering that the retention time of the two peaks is the same, and finely adjusting the retention time of the two peaks to the mean value of the two peaks, thereby finally forming a list of all the peaks, so that the difference of the retention time of every two peaks is larger than the size of the peak searching window.
5. The method for identifying soil organic matter markers according to claim 1, wherein in the step (3), the mass spectrometer Xcalibr software is used to search in a NIST database to determine the organic matter molecular composition corresponding to each peak.
6. Use of an organic matter marker identified by the method of claim 1 to identify different types of soil.
7. The use according to claim 6, wherein the number of organic matter markers and organic matter marker pairs present in the unknown soil sample is calculated, and if the total number of organic matter markers and organic matter marker pairs present in the unknown soil sample for a certain category is higher than the number of organic matter markers or organic matter marker pairs present in other categories, the unknown sample is determined to be the soil sample for the category.
CN202010581862.3A 2020-06-23 2020-06-23 Identification method and application of soil organic matter marker Active CN111650271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581862.3A CN111650271B (en) 2020-06-23 2020-06-23 Identification method and application of soil organic matter marker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581862.3A CN111650271B (en) 2020-06-23 2020-06-23 Identification method and application of soil organic matter marker

Publications (2)

Publication Number Publication Date
CN111650271A CN111650271A (en) 2020-09-11
CN111650271B true CN111650271B (en) 2022-12-13

Family

ID=72350445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581862.3A Active CN111650271B (en) 2020-06-23 2020-06-23 Identification method and application of soil organic matter marker

Country Status (1)

Country Link
CN (1) CN111650271B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947332B (en) * 2021-02-04 2022-12-16 威高国科质谱医疗科技(天津)有限公司 Triple quadrupole mass spectrometer parameter optimization method based on simulated annealing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4008388A (en) * 1974-05-16 1977-02-15 Universal Monitor Corporation Mass spectrometric system for rapid, automatic and specific identification and quantitation of compounds
CN105046003A (en) * 2015-07-23 2015-11-11 王家俊 Simulated annealing-genetic algorithm spectral feature interval selection and spectrum encryption method
WO2015169686A2 (en) * 2014-05-06 2015-11-12 Københavns Universitet A computer assisted method for quantification of total hydrocarbon concentrations and pollution type apportionment in soil samples by use of gc-fid chromatograms
CN106018600A (en) * 2016-05-23 2016-10-12 中国科学院植物研究所 Metabolism group method for distinguishing false positive mass spectra peak signals and quantificationally correcting mass spectra peak area
CN106841494A (en) * 2017-04-17 2017-06-13 宁夏医科大学 Plant otherness metabolin rapid screening method based on UPLC QTOF
CN106970161A (en) * 2017-03-04 2017-07-21 宁夏医科大学 A kind of method of the non-target method rapid screening plant otherness metabolins of GC MS
CN110320303A (en) * 2019-08-09 2019-10-11 东北大学 A kind of Efficiency for Soil Aquifer Treatment metabonomic analysis methods based on UPLC-MS

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002217904A1 (en) * 2000-11-28 2002-06-11 Surromed, Inc. Methods for efficiently minig broad data sets for biological markers
CN100561509C (en) * 2006-07-11 2009-11-18 南京大学 A kind of method for designing of improved mixed genetic algorithm optimizing water quality model parameter
CN111141809B (en) * 2020-01-20 2022-04-29 中国科学院合肥物质科学研究院 Soil nutrient ion content detection method based on non-contact type conductivity signal

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4008388A (en) * 1974-05-16 1977-02-15 Universal Monitor Corporation Mass spectrometric system for rapid, automatic and specific identification and quantitation of compounds
WO2015169686A2 (en) * 2014-05-06 2015-11-12 Københavns Universitet A computer assisted method for quantification of total hydrocarbon concentrations and pollution type apportionment in soil samples by use of gc-fid chromatograms
CN105046003A (en) * 2015-07-23 2015-11-11 王家俊 Simulated annealing-genetic algorithm spectral feature interval selection and spectrum encryption method
CN106018600A (en) * 2016-05-23 2016-10-12 中国科学院植物研究所 Metabolism group method for distinguishing false positive mass spectra peak signals and quantificationally correcting mass spectra peak area
CN106970161A (en) * 2017-03-04 2017-07-21 宁夏医科大学 A kind of method of the non-target method rapid screening plant otherness metabolins of GC MS
CN106841494A (en) * 2017-04-17 2017-06-13 宁夏医科大学 Plant otherness metabolin rapid screening method based on UPLC QTOF
CN110320303A (en) * 2019-08-09 2019-10-11 东北大学 A kind of Efficiency for Soil Aquifer Treatment metabonomic analysis methods based on UPLC-MS

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis;Yang Chao等;《BMC Bioinformatics》;20090106;第8页左栏第一段 *
藏北高寒草地土壤有机质化学组成对土壤CO2排放的影响;马书琴等;《草业科学》;20190415;摘要,1.2样品测定 *

Also Published As

Publication number Publication date
CN111650271A (en) 2020-09-11

Similar Documents

Publication Publication Date Title
JP4596010B2 (en) Mass spectrometer
CN110763660B (en) LIBS quantitative analysis method based on ensemble learning
CN106770191B (en) A kind of method of carbon detection sensitivity in raising laser microprobe
JP5559816B2 (en) A method capable of identifying unknown substances by mass spectrometry
CN111650271B (en) Identification method and application of soil organic matter marker
CN106341983B (en) Optimize the method for spectroscopic data
JP2008500537A (en) System and method for extracting spectra from data generated by a spectrometer
JP2012515902A5 (en)
CN115389690A (en) Comprehensive identification method for benzotriazole ultraviolet absorber pollutants in environment
US20080073501A1 (en) Method of analyzing mass analysis data and apparatus for the method
US5939612A (en) Retention time-locked spectral database for target analyte analysis
CN116818687B (en) Soil organic carbon spectrum prediction method and device based on spectrum guide integrated learning
KR101311412B1 (en) New Bioinformatics Platform for High-Throughput Profiling of N-Glycans
JP2007147459A (en) Data processor, program and computer-readable recording medium
JP7255597B2 (en) Data analysis method, data analysis device, and learning model creation method for data analysis
EP4078600B1 (en) Method and system for the identification of compounds in complex biological or environmental samples
WO2021064924A1 (en) Waveform analysis method and waveform analysis device
JP7327431B2 (en) Mass spectrometry data analysis method, program, and mass spectrometry data analysis device
JP5150370B2 (en) Mass spectrometry system and mass spectrometry method
CN115453009B (en) Chemical substance annotation method independent of retention time
Nedelkov et al. MALDI-MS data analysis for disease biomarker discovery
CN115541563A (en) Element quantitative analysis method based on laser-induced breakdown spectroscopy
CN117461087A (en) Method and apparatus for identifying molecular species in mass spectra
Gavard et al. Supporting information for: Rhapso: Automatic stitching of mass segments from Fourier transform ion cyclotron resonance mass spectra
Cannataro et al. On the preprocessing of mass spectrometry proteomics data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant