CN111337606A - Overlapping peak processing method and device applied to chromatographic analysis - Google Patents

Overlapping peak processing method and device applied to chromatographic analysis Download PDF

Info

Publication number
CN111337606A
CN111337606A CN202010198383.3A CN202010198383A CN111337606A CN 111337606 A CN111337606 A CN 111337606A CN 202010198383 A CN202010198383 A CN 202010198383A CN 111337606 A CN111337606 A CN 111337606A
Authority
CN
China
Prior art keywords
peak
point
peaks
points
overlapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010198383.3A
Other languages
Chinese (zh)
Other versions
CN111337606B (en
Inventor
王美美
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Le'er Environmental Technology Co ltd
Le'er Environmental Technology Jiangsu Co ltd
Original Assignee
Nantong Leer Environmental Protection Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Leer Environmental Protection Technology Co ltd filed Critical Nantong Leer Environmental Protection Technology Co ltd
Priority to CN202010198383.3A priority Critical patent/CN111337606B/en
Publication of CN111337606A publication Critical patent/CN111337606A/en
Application granted granted Critical
Publication of CN111337606B publication Critical patent/CN111337606B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8675Evaluation, i.e. decoding of the signal into analytical information
    • G01N30/8679Target compound analysis, i.e. whereby a limited number of peaks is analysed
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8603Signal analysis with integration or differentiation
    • G01N30/861Differentiation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • G01N30/8631Peaks
    • G01N30/8634Peak quality criteria
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • G01N30/8631Peaks
    • G01N30/8637Peak shape
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8624Detection of slopes or peaks; baseline correction
    • G01N30/8644Data segmentation, e.g. time windows
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/86Signal analysis
    • G01N30/8603Signal analysis with integration or differentiation
    • G01N2030/862Other mathematical operations for data preprocessing

Landscapes

  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Analytical Chemistry (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Biochemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Library & Information Science (AREA)
  • Quality & Reliability (AREA)
  • Treatment Of Liquids With Adsorbents In General (AREA)

Abstract

The invention discloses a chromatogram overlapping peak processing device, which comprises a data acquisition module, a data analysis module, a baseline noise reduction module, a calibration fitting module and an overlapping peak processing module, wherein the data acquisition module is connected with the data analysis module, the data analysis module is connected with the baseline noise reduction module, the baseline noise reduction module is connected with the calibration fitting module, and the marking fitting module is connected with the overlapping peak processing module. According to the method, the peak position is determined by the aid of the overlapped peak processing module by means of a derivative method, the overlapped peaks are determined by means of base point grouping, and the peak shape curve is fitted by means of a simplex method, so that information such as the peak height, the peak width and the projection of each component of the overlapped peaks is determined.

Description

Overlapping peak processing method and device applied to chromatographic analysis
Technical Field
The invention relates to the technical field of chromatographic overlapping peak separation, in particular to an overlapping peak processing method and device applied to chromatographic analysis.
Background
The gas chromatograph is an instrument for analyzing and detecting components in mixed gas, is widely applied to the aspects of petroleum, chemical engineering, biochemistry, medicine and health, food industry, environmental protection and the like, and a data analysis part of the gas chromatograph comprises six aspects of data acquisition, filtering, baseline correction, peak detection, peak separation, quantitative analysis and the like, wherein the separation of overlapped peaks has great influence on an analysis result, so that the difficulty and the error of quantitative calculation can be caused.
The separation of overlapping peaks refers to separating chromatographic peaks which cannot be completely separated by an instrument by using a mathematical means to obtain an estimated value of information of each sub-peak in the overlapping peaks, and currently popular methods for separating overlapping peaks can be divided into two main categories: the method is characterized in that the decomposition method is visual and has high calculation speed, and can be used for processing chromatographic data on line in real time, but the decomposition precision is different along with the overlapping degree of chromatographic peaks, when the chromatographic peaks are seriously overlapped, the calculation precision of each peak area is very poor, in the algebraic method, the most common method is a function fitting method, and the thought is as follows: firstly, a function model of a chromatographic peak is established, then overlapping peak data is fitted by using methods such as a least square method, a neural network and the like, and all undetermined parameters in the function are optimized by using a certain optimization method through continuous iteration to enable a fitted curve to continuously approach an actual curve, and finally a group of optimal estimated values are obtained, and information of each sub-peak is solved according to the parameter values.
Disclosure of Invention
The present invention is directed to a method and apparatus for processing overlapping peaks for chromatographic analysis, which solves the above problems in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme: a chromatogram overlapping peak processing device comprises a data acquisition module, a data analysis module, a baseline noise reduction module, a calibration fitting module and an overlapping peak processing module, wherein the data acquisition module is connected with the data analysis module, the data analysis module is connected with the baseline noise reduction module, the baseline noise reduction module is connected with the calibration fitting module, and the marking fitting module is connected with the overlapping peak processing module.
A chromatographic overlapping peak processing method is characterized by comprising the following steps:
s1, finding out indexes of all peak values and inflection points by using a difference method: recording all adjacent non-repeated effective value non-zero values, and utilizing a first-order difference method to solve a derivative. Judging the sign of the derivative, wherein the point where the sign of the derivative changes from positive to negative is a peak point; the point where the sign of the derivative changes is the inflection point; the inflection point includes a peak point
S2, keeping only the index of the peak satisfying the minimum peak height: giving the minimum height MinPeakHeight of the peak to be preserved, and deleting the peak which does not meet the height;
s3, retaining only the index of the peak satisfying the threshold: giving a minimum Threshold Threshold of a peak to be reserved, wherein the Threshold is used for limiting the minimum height of a peak point from the left and right side points of the peak point, and deleting the peak with the minimum height smaller than the Threshold to avoid reserving a flat-top peak;
s4, determining the boundary index of each peak: calculating a left base point, a left saddle point, a right base point and a right saddle point, wherein the maximum y value of the left base point and the right base point is the base point of the peak, taking the left base point as an example, scanning all inflection points, and searching the left base point of the peak leftwards in sequence when the peak stops: marking the current peak-valley point as valley, the corresponding peak-valley point as peak, marking the left-scanned peak-valley point as vi, and the corresponding peak-valley point as pi;
1, if vi > valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
2, if vi is greater than valley and pi is less than or equal to peak, repeating the steps 1-4, and continuously searching a left base point forwards;
3 if vi < valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
4 if vi is less than valley and pi is less than or equal to peak, updating vi as the left base point of the current peak; and repeating the steps 1-4 to continuously search for smaller peak-valley points until a point with a peak value larger than peak is met.
Therefore, the left base points of all the peaks can be found, the left base points of the reserved peaks are stored, when the peak heights of the two peaks are the same, the saddle point is the nearest peak inflection point, and the base point is the minimum peak valley point; in other cases, the saddle point is the same as the base point, and the right base point is similar to the left base point, but the right base point is searched from the right to the left in the same way;
s5, retaining only the index satisfying the minimum protrusion: given the minimum protrusion minpeakpromience, peaks smaller than the minimum protrusion are deleted. And (3) protrusion: the difference between the y coordinate of the peak point and the y coordinate of the highest base point is determined by the inherent height of the peak and the protrusion degree determined by the position relative to other peaks;
s6, calculating the x coordinate of the half-height-width boundary of each peak: determine approximate height of full width at half maximum refHeight: half of the sum of the y coordinate of the peak point and the y coordinate of the base point. Find the left boundary of full width at half maximum: searching the peak point to the left base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point adjacent to the right side of the point and the refHeight point to obtain the x coordinate of a left boundary with half-height width; find the right boundary of full width at half maximum: searching the peak point to the right base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point close to the left side of the point and the refHeight point to obtain the x coordinate of a right boundary with the half-height width;
s7, retaining only peaks within a given width range: giving a minimum peak searching width minW and a maximum peak searching width maxW, if no peak exists or the minimum width is 0 and the maximum width is infinite, not screening, calculating the full width at half maximum by using an x coordinate of a full width at half maximum boundary, and deleting peak values which are not in a given width range;
s8, finding out the index of the maximum peak value in the specified distance: the peaks are sorted from large to small, starting with the larger peak to ensure that we do not accidentally retain a small peak and remove a large peak in its vicinity. If a certain peak value is not near to a larger peak value, finding a secondary peak value within a set distance range of the peak value for elimination, and performing the following loop operation for each effective peak:
marking the peak value within the set distance range of the effective peak value index minD as 1 to indicate that the peak value is deleted, otherwise marking 0 to indicate that the peak value is not deleted, circulating next time, searching around the marking 0 without scanning the peak value of the marking 1, and keeping the peak value index of the marking 0;
after all the effective peaks needing to be reserved are determined, the effective peaks correspond to the original peak value index sequence again;
s9, sorting the peak values, limiting the number of the peaks: inputting the number numP of peaks to be detected, sorting the detected peaks from large to small, and taking out the first numP peaks as the result;
s10, returning: and returning information such as peak height, peak position, peak width, protrusion, coordinates of left and right base points and the like corresponding to the index value.
Preferably, the data acquisition module 1 acquires a voltage signal of the FID gas chromatography detection system, and the chromatographic signal is amplified by hardware filtering, and the signal value is sent to the PC end by the analog-to-digital converter and the singlechip control system.
Preferably, the baseline noise reduction module 3 performs noise reduction processing on the acquired chromatographic data, and performs real-time data filtering by combining an SG filter and a morphological filter.
Preferably, the calibration fitting module 4 calibrates the concentration of the chromatographic analyzer by using a standard sample gas sample, stores the calibration result in a configuration file, and performs linear transformation by referring to the calibration table in the specific calculation of the concentration of each component by using a linear interpolation method.
Preferably, the overlapped peak processing module 5 is used for separating the chromatographic data by separating overlapped peaks, the chromatography works according to the slight difference of the physical and chemical properties of substances, when the physical and chemical properties of two components are similar, the effluent chromatographic curves are close to each other on a time axis, so that the obtained chromatographic peaks are overlapped, and the module is used for separating the chromatographic peaks so as to obtain the substance types and concentrations of the components.
Preferably, the steps 1 through S10 may cause fluctuations or drifts in the baseline as the flow and temperature settings change, assuming that the gas chromatograph is approaching a constant temperature and the components are in a steady state with a relatively smooth baseline after a sufficient amount of time has elapsed since the last change in operating conditions.
Preferably, the base point peak-valley point on the baseline is the actual boundary of the peak, and the base point peak-valley point of the baseline above a certain threshold is the intersection of overlapping peaks. The peak with the cross points is definitely an overlapped peak, and the number of the cross points plus 1 is the number of the sub-peaks in the overlapped peak. The nearest non-intersection points around the intersection point are the left and right boundary points of the overlapping peak. Thus, the overlapping peak range can be determined by using the left and right boundary points, and fitting is performed.
The algorithm comprises the following steps:
g) and performing de-duplication ascending operation on the base points obtained by the previous part of derivative peak searching method by taking the x coordinate as a reference to obtain the peak-valley point coordinates of the ordered base points of the signal data.
h) According to the actual data situation, given a threshold value threshold _ y in the y-axis direction, points higher than the threshold value are intersection points, and points lower than the threshold value are boundary points. If the threshold is too small, the peak with better separation may be judged as an overlapping peak, and if the threshold is too large, some overlapping peaks with large separation degree may be ignored, so as to give up the viewing requirement.
i) And sequentially scanning the ordered base points, recording the previous boundary point as the left boundary point i _ left of the overlapped peak when an intersection is encountered, continuing to scan, recording the next boundary point as the right boundary point i _ right of the overlapped peak when the next boundary point is encountered, and recording the number of the intersection points plus one as the number i _ nump of the sub-peaks in the overlapped peak. And continuing to store the left and right boundary points and the number of sub-peaks of all the overlapped peaks.
j) The fitting center and the fitting range window are calculated.
Figure BDA0002418450300000051
window=baseX(i_Tight)-baseX(i_left)
In the formula, baseX (i _ left) and baseX (i _ Right) are x coordinates of left and right base points of the overlapping peak, respectively.
k) The initial value start of the parameter to be fitted is determined. I.e. the peak position and width of each peak in the overlapping peaks, are calculated by the derivative peak-finding method of the previous part.
l) fitting for each overlapping peak:
inputting parameters such as chromatographic signal data y, a fitting center, a fitting window, the number numP of peaks to be fitted, an initial value start of a parameter to be fitted and the like into an N-M simplex type iterative fitting model for fitting, obtaining a fitting result according to an evaluation standard, and updating the peak height and the peak width.
Preferably, the Nelder-Mead simplex type algorithm is a direct search method for optimizing multidimensional unconstrained problems.
Basic idea thereofFirstly using a search starting point x in an m-dimensional parameter space0And constructing a polyhedron with m +1 linear independent vertexes, determining the next search direction by comparing objective function values of the vertexes, performing heuristic operations of reflection, expansion, contraction and compression side length on the polygon, and replacing the worst point with a better new vertex to form a new polyhedron. And continuously adjusting the parameter values in such a way, and finally approaching the optimal solution of the target function.
The algorithm comprises the following steps:
first, a function is constructed that calculates the mean square error of the model and the original signal, and if the fitting error is greater than the desired fitting accuracy, the program systematically changes the parameters and loops to the previous step and repeats until the desired fitting accuracy is achieved or the maximum number or iteration is achieved.
And I, constructing a parameter estimation function.
Constructing a mean square error function of a calculation model and an original signal:
Figure BDA0002418450300000061
the following description is given of ymodlThe calculation process of (2):
first of all a gaussian matrix a is determined,
Figure BDA0002418450300000062
where h is the number of peaks, n is the number of data, g (x λ)i) Is a gaussian function.
Figure BDA0002418450300000063
The relation between the peak height matrix H, the Gaussian matrix A and the signal data matrix Y is H ═ abs (A \ Y)T) And the result H is an approximation, then there is ymodel=A*H。
II setting of end conditions
Setting the termination error threshold of the parameter lambda to be fitted to be 0.0000001 if lambdaii-1≥0.0000001,Replacing the estimated value of the unknown parameter according to the III simplex iteration process, and carrying out the next iteration until lambdaii-1If the number of iterations is less than 0.0000001, the iteration is terminated, otherwise, the iteration stops 1000 times.
The error calculation formula is as follows:
Figure BDA0002418450300000071
referred to as mean fit error, MeanFitError, MFE, i.e., minimum fit error.
III iterative Process
Using a simplex consisting of n +1 points of the n-dimensional vector x, the algorithm first sets an initial estimate
x0For each part x0(i) Increase by 5% to the corresponding x0In, will divide x0The n-dimensional vector outside as the initial simplex if x0(i) Using 0.00025 as component i, the algorithm iterates through the simplex iteratively according to the following steps:
(7) let x (i) denote the current simplex point data list, i 1.
(8) Sorting the simple type vertexes from small to large according to function values, wherein the shapes are as follows: f (x (1)) <. < f (x (n +1)), at each step of the iteration the algorithm discards the current worst pastry x (n +1), receives another point as a simplex point or, in the following step 7, it changes all n points greater than f (x (1)).
(9) Generating a reflection point: r-2 m-x (n +1), wherein,
Figure BDA0002418450300000072
f (τ) is calculated.
(10) If f (x (1)) ≦ f (τ) < f (x (n)), accepting r, and terminating the iteration. Known as reflection
(11) If f (τ) < f (x (1)), the expansion point s is calculated, s ═ m +2(m-x (n +1)), and f(s):
c. if f(s) < f (τ), accept s, terminate the iteration, called extended
d. Otherwise, accept r, terminate iteration, reflect.
(12) If f (tau) ≧ f (x (n)), compression processing is performed between m and the better of x (n +1) and r:
c. if f (τ) < f (x (n +1)), (e.g., r is better than x (n +1)), c ═ m + (τ -m)/2 is calculated, and f (c) is calculated. If f (c) < f (τ),
accept c, terminate the iteration, called the extract outside. Otherwise, step 7 shrink is performed.
d. If f (τ) ≧ f (x (n +1)), cc ═ m + (x (n +1) -m)/2 is calculated, and f (cc) is calculated, if
f (cc)) < f (x (n +1)), accept cc, terminate the iteration, extract, otherwise, go to step 7.
Calculate these n points:
Figure BDA0002418450300000081
calculate f (v (i)), i ═ 2. The next iteration of simplex is x (1), x (2),.., v (n +1), called shrink.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention adopts a data acquisition, baseline noise reduction, calibration fitting and overlapping peak processing module; the method is characterized in that a core module is an overlapping peak processing module, the method comprises the steps of firstly determining the peak position by adopting a derivative method, then determining the overlapping peak by grouping base points, and fitting a peak shape curve by using a simplex method, thereby determining the information such as the peak height, the peak width, the protrusion and the like of each component of the overlapping peak, the method is high in accuracy of determining the peak position, high in accuracy of determining the information of each component of the overlapping peak, can automatically define the peak separating interval, and can automatically determine the number of sub-peaks in the overlapping peak, and has the advantages of accurately determining the peak position and the information of each component of the overlapping peak;
2. the method has the advantages that the concept is simple, differentiation is not needed, the method is not sensitive to the initial value, and each iteration only needs no more than 2 times of function evaluation, so the searching speed is high, and the like.
Drawings
FIG. 1 is a schematic diagram of the overall structure of a chromatographic overlapping peak processing device according to the present invention;
FIG. 2 is a flow chart of a chromatographic overlapping peak processing method of the present invention
FIG. 3 is a graph of the first overlapping peak fitting effect of a chromatographic overlapping peak processing method of the present invention, wherein the upper half is the original data curve and the lower half is the fitting curve;
FIG. 4 is a second overlapping peak fitting effect of the chromatographic overlapping peak processing method of the present invention, wherein the dotted curve is the original data curve, the short dashed line is the fitted single peak curve, and the solid line is the superposition effect of the fitting results;
FIG. 5 is a graph showing the fitting effect of the third overlapping peak of the chromatographic overlapping peak processing method of the present invention, wherein the dotted curve is the original data curve, the short dashed line is the fitted single peak curve, and the solid line is the superposition effect of the fitting results.
In the figure: 1. a data acquisition module; 2. a data analysis module; 3. a baseline noise reduction module; 4. calibrating a fitting module; 5. and an overlapping peak processing module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution: a chromatogram overlapping peak processing device comprises a data acquisition module 1, a data analysis module 2, a baseline noise reduction module 3, a calibration fitting module 4 and an overlapping peak processing module 5, wherein the data acquisition module 1 is connected with the data analysis module 2, the data analysis module 2 is connected with the baseline noise reduction module 3, the baseline noise reduction module 3 is connected with the calibration fitting module 4, and the marking line fitting module is connected with the overlapping peak processing module 5.
A chromatographic overlapping peak processing method specifically comprises the following steps:
s1, finding out indexes of all peak values and inflection points by using a difference method: recording all adjacent non-repeated effective value non-zero values, and utilizing a first-order difference method to solve a derivative. Judging the sign of the derivative, wherein the point where the sign of the derivative changes from positive to negative is a peak point; the point where the sign of the derivative changes is the inflection point; the inflection point includes a peak point
S2, keeping only the index of the peak satisfying the minimum peak height: giving the minimum height MinPeakHeight of the peak to be preserved, and deleting the peak which does not meet the height;
s3, retaining only the index of the peak satisfying the threshold: giving a minimum Threshold Threshold of a peak to be reserved, wherein the Threshold is used for limiting the minimum height of a peak point from the left and right side points of the peak point, and deleting the peak with the minimum height smaller than the Threshold to avoid reserving a flat-top peak;
s4, determining the boundary index of each peak: calculating a left base point, a left saddle point, a right base point and a right saddle point, wherein the maximum y value of the left base point and the right base point is the base point of the peak, taking the left base point as an example, scanning all inflection points, and searching the left base point of the peak leftwards in sequence when the peak stops: marking the current peak-valley point as valley, the corresponding peak-valley point as peak, marking the left-scanned peak-valley point as vi, and the corresponding peak-valley point as pi;
1, if vi > valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
2, if vi is greater than valley and pi is less than or equal to peak, repeating the steps 1-4, and continuously searching a left base point forwards;
3 if vi < valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
4 if vi is less than valley and pi is less than or equal to peak, updating vi as the left base point of the current peak; and repeating the steps 1-4 to continuously search for smaller peak-valley points until a point with a peak value larger than peak is met.
Therefore, the left base points of all the peaks can be found, the left base points of the reserved peaks are stored, when the peak heights of the two peaks are the same, the saddle point is the nearest peak inflection point, and the base point is the minimum peak valley point; in other cases, the saddle point is the same as the base point, and the right base point is similar to the left base point, but the right base point is searched from the right to the left in the same way;
s5, retaining only the index satisfying the minimum protrusion: given the minimum protrusion minpeakpromience, peaks smaller than the minimum protrusion are deleted. And (3) protrusion: the difference between the y coordinate of the peak point and the y coordinate of the highest base point is determined by the inherent height of the peak and the protrusion degree determined by the position relative to other peaks;
s6, calculating the x coordinate of the half-height-width boundary of each peak: determine approximate height of full width at half maximum refHeight: half of the sum of the y coordinate of the peak point and the y coordinate of the base point. Find the left boundary of full width at half maximum: searching the peak point to the left base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point adjacent to the right side of the point and the refHeight point to obtain the x coordinate of a left boundary with half-height width; find the right boundary of full width at half maximum: searching the peak point to the right base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point close to the left side of the point and the refHeight point to obtain the x coordinate of a right boundary with the half-height width;
s7, retaining only peaks within a given width range: giving a minimum peak searching width minW and a maximum peak searching width maxW, if no peak exists or the minimum width is 0 and the maximum width is infinite, not screening, calculating the full width at half maximum by using an x coordinate of a full width at half maximum boundary, and deleting peak values which are not in a given width range;
s8, finding out the index of the maximum peak value in the specified distance: the peaks are sorted from large to small, starting with the larger peak to ensure that we do not accidentally retain a small peak and remove a large peak in its vicinity. If a certain peak value is not near to a larger peak value, finding a secondary peak value within a set distance range of the peak value for elimination, and performing the following loop operation for each effective peak:
marking the peak value within the set distance range of the effective peak value index minD as 1 to indicate that the peak value is deleted, otherwise marking 0 to indicate that the peak value is not deleted, circulating next time, searching around the marking 0 without scanning the peak value of the marking 1, and keeping the peak value index of the marking 0;
after all the effective peaks needing to be reserved are determined, the effective peaks correspond to the original peak value index sequence again;
s9, sorting the peak values, limiting the number of the peaks: inputting the number numP of peaks to be detected, sorting the detected peaks from large to small, and taking out the first numP peaks as the result;
s10, returning: and returning information such as peak height, peak position, peak width, protrusion, coordinates of left and right base points and the like corresponding to the index value.
In the invention, a data acquisition module 1 acquires a voltage signal of an FID gas chromatography detection system, the chromatographic signal is filtered and amplified by hardware, and a signal value is sent to a PC end by an analog-to-digital converter and a singlechip control system;
in the invention, the baseline noise reduction module 3 filters the acquired voltage signals by denoising the acquired chromatographic data, combining an SG filter and a morphological filter for real-time data filtering, and combining the SG filter with a filtering window of 19 and the morphological filters with structural elements of g1 and g 2;
in the invention, a calibration fitting module 4 calibrates the concentration of a chromatographic analyzer by a standard sample gas sample, stores the calibration result in a configuration file, performs linear conversion by a linear interpolation method according to a calibration table when the concentration of each component is specifically calculated, firstly introduces standard gas with known concentration components as benzene series, calibrates the components and the concentration of each peak by observing a spectrogram curve, calibrates the components and the concentration of each peak two to three times, and fits the relationship between the peak height and the concentration of each substance to obtain a fitting curve
Y ═ a × X + b, where Y is concentration and X is peak height. And substituting the peak height into a curve equation to obtain the measured real-time concentration.
Two species were measured separately for two experiments:
benzene: the nominal concentration 50, nominal height 56.593,
calibration concentration 40, calibration height 45.274
Calibration curve y-0 +0.8835 x
Toluene: the nominal concentration 60, nominal height 61.516,
nominal concentration 50, nominal height 51.261
The calibration relation y is 0+0.9754 x;
in the invention, the overlapped peak processing module 5 separates overlapped peaks of chromatographic data, the chromatographic method works according to the tiny difference of physical and chemical properties of substances, when the physical and chemical properties of two components are similar, the effluent chromatographic curves are very close on a time axis, so that the obtained chromatographic peaks are overlapped, the module is used for separating the chromatographic peaks, so that the substance types and the concentrations of the components are obtained, and accurate real signals are obtained after filtering, so that component analysis can be carried out. The component analysis is to analyze the components and concentrations of the substances to be detected based on the peak appearance of the spectrum.
The performance of the overlapping peak separation method is evaluated from three aspects of an intuitive separation effect graph, an evaluation standard of a fitting effect and separation errors under different separation degrees; and finally the performance of the whole device is verified through experiments.
First, a calculation method of each index is explained:
Figure BDA0002418450300000131
real area:
with the integration concept, the real area is the sum of all y values of the Gaussian peaks, and the x-axis has the unit of 1.
Figure BDA0002418450300000132
Area error:
area error is fit area-true area
Note that it is different from the separation error.
Figure BDA0002418450300000133
Signal-to-noise ratio:
Figure BDA0002418450300000134
wherein, N is the data number, x is the data vector of the signal with noise, and y is the data vector of the pure signal. Only the signal-to-noise ratio of the fitting segment data is calculated herein.
Figure BDA0002418450300000135
Relative Standard Deviation (RSD):
Figure BDA0002418450300000136
where S is the standard deviation (which may also be expressed as SD),
Figure BDA0002418450300000146
are the corresponding average values.
Figure BDA0002418450300000141
Square of correlation coefficient (R2):
a. and a variance (SSE) of square products to error, The statistical parameter being calculated as The sum of The squares of The errors of The corresponding points of The fitting data and The original data, The calculation formula being as follows:
Figure BDA0002418450300000142
the closer the SSE is to 0, the better the model selection and fitting, and the more successful the data prediction. The following MSE and RMSE, since SSE is the same,
Figure BDA0002418450300000143
the effect is the same.
SST: total sum of squares, the sum of the squares of the differences between the raw data and the mean, is given by the formula:
Figure BDA0002418450300000144
c. square of correlation coefficient
Figure BDA0002418450300000145
In essence, the "coefficient of certainty" characterizes how well a fit is by the change in data. From the above expression, it can be known that the normal value range of the "determination coefficient" is [0,1], and the closer to 1, the stronger the explanatory ability of the variable of the equation to y is, and the better the model fits the data.
The experimental procedure is described below:
1) evaluation of fitting Effect
The experiment simulates 10 different overlapped peaks, wherein the overlapped peaks consist of Gaussian peaks with the width of 10-50 and the peak position of 70-120, and Gaussian noise with the floating range of 0.003-0.05 and baseline noise with the slope of 0.001 are added. The results of the experiments are shown in the following table:
table 110 comparison table of fitting effect of overlapping peaks
Figure BDA0002418450300000151
Figure BDA0002418450300000161
Figure BDA0002418450300000171
As can be seen from the table above, for the signal data with the average signal-to-noise ratio of 72.64, the fitting error is less than 0.6%, and the correlation coefficient is as high as 0.999 or more, indicating that the fitting effect is good.
In order to prove the stability of the algorithm, the method also performs the following experiments: and 6 kinds of noises with different signal-to-noise ratios are added to the same analog data (formed by overlapping two Gaussian peaks with the widths of 10 and 20 respectively) to verify the fitting effect. The results of the experiments are shown in the following table:
TABLE 2 comparison table of fitting effect of overlapping peaks of the same sample at different times
Figure BDA0002418450300000172
Figure BDA0002418450300000181
Figure BDA0002418450300000191
As can be seen from the above table, in the process that the signal-to-noise ratio is reduced from 94.58 to 53.53, the fitting error of the signal data is less than 0.7% (mean square error is 0.18), and the correlation coefficient is as high as more than 0.999 (mean square error is 0.0001), which indicates that the fitting method is stable.
2) Separation error
The separation degree is used as an index of the separation efficiency of the chromatographic column, the separation condition of two adjacent components in the chromatographic column can be judged, and the larger the separation degree R is, the better the separation of the two adjacent components is. Generally, when the degree of separation R <1, the two peaks partially overlap.
The resolution calculation formula is as follows:
Figure BDA0002418450300000192
in the formula: the retention times of the 2 single peaks constituting the overlapping peak, respectively, and the peak bottom widths of the 2 single peaks, respectively.
Four kinds of simulation data with the separation degrees of 0.50, 0.67, 0.84 and 1.01 are designed in the experiment, and Gaussian noise and baseline noise with the same floating range are added. The 4 groups of data are tested, and the separation error is calculated by the following formula:
Figure BDA0002418450300000193
in the formula: sXIs the area of the X peak.
The results of the experiment are reported below:
TABLE 3 separation error under different degrees of separation
Figure BDA0002418450300000194
Figure BDA0002418450300000201
As can be seen from the above table, the separation error of the signal data with the average signal-to-noise ratio of 68.11 is within the range of + -1.6.
3) Graph of separation effect
The separation effect can be seen visually from the separation effect graph, three different overlapped peaks are selected for processing in the experiment, and the separation effect is shown in attached figures 3, 4 and 5.
4) Device performance
The device is mainly used for separating overlapped peaks and measuring components and concentrations of all components, so that the measurement performance is the relative standard deviation and repeatability of each measured component.
Because the properties of the benzene series are similar and overlapping peaks are easy to generate, the benzene series is taken as an experimental object in the experiment, and the measured gas is diluted benzene series standard gas. The following table shows the benzene analysis results in 9 experiments:
TABLE 4 test results of benzene series
Figure BDA0002418450300000211
Figure BDA0002418450300000221
Generally, the change of the continuous sampling retention time in the measurement of the sample gas with the same concentration needs to meet the requirement of less than +/-1%, and the change of the peak height and the peak area needs to meet the requirement of less than +/-3%. As can be seen from the above table, the relative standard deviations of retention time, peak height and peak area measured after 9 times of data measurement are only 0.24%, 1.24% and 1.72%, respectively, so that qualitative repeatability analysis using retention time as a standard meets the requirement of ± 1%, and meanwhile, quantitative repeatability analysis using peak height and peak area as standards meets the requirement of less than ± 3%;
in the present invention, fluctuations or drifts in the baseline may occur when the flow and temperature settings are changed in steps 1 through S10, assuming that the gas chromatograph is gradually brought close to a constant temperature, the components are in a steady state, and the baseline is relatively smooth, after a sufficiently long time has elapsed since the operating conditions were last changed;
in the present invention, the base point peak-valley point on the base line is the actual boundary of the peak, and the base point peak-valley point of the base line higher than a certain threshold value is the intersection of the overlapping peaks. The peak with the cross points is definitely an overlapped peak, and the number of the cross points plus 1 is the number of the sub-peaks in the overlapped peak. The nearest non-intersection points around the intersection point are the left and right boundary points of the overlapping peak. Thus, the overlapping peak range can be determined by using the left and right boundary points, and fitting is performed.
The algorithm comprises the following steps:
m) performing de-weight ascending operation on the base point obtained by the previous part of derivative peak searching method by taking the x coordinate as a reference to obtain the peak-valley point coordinate of the ordered base point of the signal data.
n) according to the actual data situation, a threshold value threshold _ y in the y-axis direction is given, points higher than the threshold value are intersection points, and points lower than the threshold value are boundary points. If the threshold is too small, the peak with better separation may be judged as an overlapping peak, and if the threshold is too large, some overlapping peaks with large separation degree may be ignored, so as to give up the viewing requirement.
o) sequentially scanning the ordered base points, recording the previous boundary point as the left boundary point i _ left of the overlapped peak when an intersection is encountered, continuing to scan, recording the next boundary point as the right boundary point i _ right of the overlapped peak when the next boundary point is encountered, and recording the number of the intersection points plus one as the number i _ nump of the sub-peaks in the overlapped peak. And continuing to store the left and right boundary points and the number of sub-peaks of all the overlapped peaks.
p) calculating the fitting center and the fitting range window.
Figure BDA0002418450300000231
window=baseX(i_Tight)-baseX(i_left)
In the formula, baseX (i _ left) and baseX (i _ Right) are x coordinates of left and right base points of the overlapping peak, respectively.
q) determining an initial value start of the parameter to be fitted. I.e. the peak position and width of each peak in the overlapping peaks, are calculated by the derivative peak-finding method of the previous part.
r) fitting to each overlapping peak:
inputting parameters such as chromatographic signal data y, a fitting center, a fitting window, the number numP of peaks to be fitted, an initial value start of a parameter to be fitted and the like into an N-M simplex type iterative fitting model for fitting, obtaining a fitting result according to an evaluation standard, and updating the peak height and the peak width;
in the invention, a Nelder-Mead simplex algorithm is a direct search method for optimizing a multidimensional unconstrained problem.
The basic idea is to use the search starting point x in the m-dimensional parameter space0And constructing a polyhedron with m +1 linear independent vertexes, determining the next search direction by comparing objective function values of the vertexes, performing heuristic operations of reflection, expansion, contraction and compression side length on the polygon, and replacing the worst point with a better new vertex to form a new polyhedron. And continuously adjusting the parameter values in such a way, and finally approaching the optimal solution of the target function.
The algorithm comprises the following steps:
first, a function is constructed that calculates the mean square error of the model and the original signal, and if the fitting error is greater than the desired fitting accuracy, the program systematically changes the parameters and loops to the previous step and repeats until the desired fitting accuracy is achieved or the maximum number or iteration is achieved.
And I, constructing a parameter estimation function.
Constructing a mean square error function of a calculation model and an original signal:
Figure BDA0002418450300000241
the following description is given of ymodelThe calculation process of (2):
first of all a gaussian matrix a is determined,
Figure BDA0002418450300000251
where h is the number of peaks, n is the number of data, g (x λ)i) Is a gaussian function.
Figure BDA0002418450300000252
The relation between the peak height matrix H, the Gaussian matrix A and the signal data matrix Y is H ═ abs (A \ Y)T) And the result H is an approximation, then there is ymodel=A*H。
II setting of end conditions
Setting the termination error threshold of the parameter lambda to be fitted to be 0.0000001 if lambdaii-1If the absolute value of the parameter is more than or equal to 0.0000001, replacing the estimated value of the unknown parameter according to the III simplex iteration process, and carrying out the next iteration until lambda is reachedii-1If the number of iterations is less than 0.0000001, the iteration is terminated, otherwise, the iteration stops 1000 times.
The error calculation formula is as follows:
Figure BDA0002418450300000253
referred to as mean fit error, MeanFitError, MFE, i.e., minimum fit error.
III iterative Process
Using a simplex consisting of n +1 points of an n-dimensional vector x, the algorithm first sets an initial estimate x0For each part x0(i) Increase by 5% to the corresponding x0In, will divide x0The n-dimensional vector outside as the initial simplex if x0(i) Using 0.00025 as component i, the algorithm iterates through the simplex iteratively according to the following steps:
(13) let x (i) denote the current simplex point data list, i 1.
(14) Sorting the simple type vertexes from small to large according to function values, wherein the shapes are as follows: f (x (1)) <. < f (x (n +1)), at each step of the iteration the algorithm discards the current worst pastry x (n +1), receives another point as a simplex point or, in the following step 7, it changes all n points greater than f (x (1)).
(15) Generating a reflection point: r-2 m-x (n +1), wherein,
Figure BDA0002418450300000261
f (τ) is calculated.
(16) If f (x (1)) ≦ f (τ) < f (x (n)), accepting r, and terminating the iteration. Known as reflection
(17) If f (τ) < f (x (1)), the expansion point s is calculated, s ═ m +2(m-x (n +1)), and f(s):
e. if f(s) < f (τ), accept s, terminate the iteration, called extended
f. Otherwise, accept r, terminate iteration, reflect.
(18) If f (tau) ≧ f (x (n)), compression processing is performed between m and the better of x (n +1) and r:
e. if f (τ) < f (x (n +1)), (e.g., r is better than x (n +1)), c ═ m + (τ -m)/2 is calculated, and f (c) is calculated. If f (c) < f (τ), c is accepted, terminating the iteration, called the extract output. Otherwise, step 7 shrink is performed.
f. If f (τ) ≧ f (x (n +1)), calculate cc ═ m + (x (n +1) -m)/2, and calculate f (cc), if f (cc) < f (x (n +1)), accept cc, terminate the iteration, contact inside, otherwise, go to step 7.
Calculate these n points:
Figure BDA0002418450300000271
calculate f (v (i)), i ═ 2. The next iteration of simplex is x (1), v (2),. v (n +1), called shrink.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. The utility model provides a chromatogram overlapping peak processing apparatus, includes data acquisition module (1), data analysis module (2), baseline noise reduction module (3), marks fitting module (4) and overlapping peak processing module (5), its characterized in that: the data acquisition module (1) is connected with the data analysis module (2), the data analysis module (2) is connected with the baseline noise reduction module (3), the baseline noise reduction module (3) is connected with the calibration fitting module (4), and the marking fitting module is connected with the overlapping peak processing module (5).
2. A chromatographic overlapping peak processing method is characterized by comprising the following steps:
s1, finding out indexes of all peak values and inflection points by using a difference method: recording all adjacent non-repeated effective values (non-zero values), solving the derivative by using a first-order difference method, judging the sign of the derivative, and obtaining a peak value point at the point where the sign of the derivative is changed from positive to negative; the point where the sign of the derivative changes is the inflection point; (inflection point includes peak point)
S2, keeping only the index of the peak satisfying the minimum peak height: giving the minimum height MinPeakHeight of the peak to be preserved, and deleting the peak which does not meet the height;
s3, retaining only the index of the peak satisfying the threshold: giving a minimum Threshold Threshold of a peak to be reserved, wherein the Threshold is used for limiting the minimum height of a peak point from the left and right side points of the peak point, and deleting the peak with the minimum height smaller than the Threshold to avoid reserving a flat-top peak;
s4, determining the boundary index of each peak: calculating a left base point, a left saddle point, a right base point and a right saddle point, wherein the maximum y value of the left base point and the right base point is the base point of the peak, taking the left base point as an example, scanning all inflection points, and searching the left base point of the peak leftwards in sequence when the peak stops: marking the current peak-valley point as valley, the corresponding peak-valley point as peak, marking the left-scanned peak-valley point as vi, and the corresponding peak-valley point as pi;
1) if vi > valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1) to 4) to continuously search a left base point of the next peak;
2) if vi is greater than valley and pi is less than or equal to peak, repeating the steps 1) -4), and continuing to search the left base point forwards;
3) if vi < valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1) to 4) to continuously search a left base point of the next peak;
4) if vi is less than valley and pi is less than or equal to peak, updating vi as the left base point of the current peak; repeating the steps 1) to 4) to continuously search for smaller peak-to-valley points until a point with a peak value larger than peak is encountered,
therefore, the left base points of all the peaks can be found, the left base points of the reserved peaks are stored, when the peak heights of the two peaks are the same, the saddle point is the nearest peak inflection point, and the base point is the minimum peak valley point; in other cases, the saddle point is the same as the base point, and the right base point is similar to the left base point, but the right base point is searched from the right to the left in the same way;
s5, retaining only the index satisfying the minimum protrusion: given the minimum protrusion minpeakpominence, the peaks smaller than the minimum protrusion are deleted, protrusion: the difference between the y coordinate of the peak point and the y coordinate of the highest base point is determined by the inherent height of the peak and the protrusion degree determined by the position relative to other peaks;
s6, calculating the x coordinate of the half-height-width boundary of each peak: determine approximate height of full width at half maximum refHeight: half of the sum of the y coordinate of the peak point and the y coordinate of the base point, the left boundary of the full width at half maximum is found: searching the peak point to the left base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point adjacent to the right side of the point and the refHeight point to obtain the x coordinate of a left boundary with half-height width; find the right boundary of full width at half maximum: searching the peak point to the right base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point close to the left side of the point and the refHeight point to obtain the x coordinate of a right boundary with the half-height width;
s7, retaining only peaks within a given width range: giving a minimum peak searching width minW and a maximum peak searching width maxW, if no peak exists or the minimum width is 0 and the maximum width is infinite, not screening, calculating the full width at half maximum by using an x coordinate of a full width at half maximum boundary, and deleting peak values which are not in a given width range;
s8, finding out the index of the maximum peak value in the specified distance: sorting the peaks from large to small, judging from the larger peak to ensure that we can not accidentally retain a small peak and remove a large peak nearby, if a certain peak is not nearby the larger peak, finding a secondary peak within a set distance range of the peak to eliminate, and performing the following loop operation for each effective peak:
the peak within the range of the valid peak index minD (set distance) is marked as 1 (indicating deleted), otherwise marked as 0 (indicating not deleted), the next cycle, the peak of marked 1 is not scanned again, the search is performed around marked 0, and the peak index of marked 0 is retained;
after all the effective peaks needing to be reserved are determined, the effective peaks correspond to the original peak value index sequence again;
s9, sorting the peak values, limiting the number of the peaks: inputting the number numP of peaks to be detected, sorting the detected peaks from large to small, and taking out the first numP peaks as the result;
s10, returning: and returning information such as peak height, peak position, peak width, protrusion, coordinates of left and right base points and the like corresponding to the index value.
3. A chromatographic overlapping peak processing device according to claim 1, characterized in that: the data acquisition module (1) is used for acquiring voltage signals of the FID gas chromatography detection system, the chromatographic signals are amplified through hardware filtering, and the single chip microcomputer control system sends signal values to the PC terminal.
4. A chromatographic overlapping peak processing device according to claim 1, characterized in that: the baseline noise reduction module (3) carries out noise reduction processing on the acquired chromatographic data, and adopts SG filters and morphological filters to carry out real-time data filtering.
5. A chromatographic overlapping peak processing device according to claim 1, characterized in that: the calibration fitting module (4) calibrates the concentration of the chromatographic analyzer by a standard sample gas sample, stores the calibrated result in a configuration file, and performs linear conversion by referring to the calibration table when the concentration of each component is specifically calculated.
6. A chromatographic overlapping peak processing device according to claim 1, characterized in that: the overlapped peak processing module (5) is used for separating the chromatographic data to obtain the substance types and concentrations of the components by separating the overlapped peaks, the chromatography works according to the slight difference of the physical and chemical properties of the substances, when the physical and chemical properties of the two components are similar, the effluent chromatographic curves are close to each other on a time axis, so that the obtained chromatographic peaks are overlapped.
7. A chromatographic overlapping peak processing method according to claim 2, characterized in that: the steps 1 through S10 may cause fluctuations or drifts in the baseline as the flow and temperature settings change, assuming that the gas chromatograph is approaching a constant temperature and the components are in a steady state with a relatively smooth baseline after a sufficient amount of time has elapsed since the last change in operating conditions.
8. A chromatographic overlapping peak processing method according to claim 7, characterized in that: the base point (peak-valley point) on the base line is the actual boundary of the peak, and the base point (peak-valley point) of the base line above a certain threshold is the intersection of the overlapping peaks. The peak with the cross points is definitely an overlapped peak, and the number of the cross points plus 1 is the number of the sub-peaks in the overlapped peak. The nearest non-intersection points around the intersection point are the left and right boundary points of the overlapping peak. Thus, the overlapping peak range can be determined by using the left and right boundary points, and fitting is performed.
The algorithm comprises the following steps:
a) and (4) performing de-weight ascending operation on the base point obtained by the previous part (derivative peak searching method) by taking the x coordinate as a reference to obtain the ordered base point (peak-valley point) coordinate of the signal data.
b) According to the actual data situation, given a threshold value threshold _ y in the y-axis direction, points higher than the threshold value are intersection points, and points lower than the threshold value are boundary points. If the threshold is too small, the peak with better separation may be judged as an overlapping peak, and if the threshold is too large, some overlapping peaks with large separation degree may be ignored, so as to give up the viewing requirement.
c) And sequentially scanning the ordered base points, recording the previous boundary point as the left boundary point i _ left of the overlapped peak when an intersection is encountered, continuing to scan, recording the next boundary point as the right boundary point i _ right of the overlapped peak when the next boundary point is encountered, and recording the number of the intersection points plus one as the number i _ nump of the sub-peaks in the overlapped peak. And continuing to store the left and right boundary points and the number of sub-peaks of all the overlapped peaks.
d) The fitting center and the fitting range window are calculated.
Figure FDA0002418450290000041
window=baseX(i_right)-baseX(i_left)
In the formula, baseX (i _ left) and baseX (i _ right) are x coordinates of left and right base points of the overlapping peak, respectively.
e) The initial value start of the parameter to be fitted is determined. I.e. the peak position and width of each peak in the overlapping peaks, are calculated from the previous part (derivative peak finding).
f) Fit to each overlapping peak:
inputting parameters such as chromatographic signal data y, a fitting center, a fitting window, the number numP of peaks to be fitted, an initial value start of a parameter to be fitted and the like into an N-M simplex type iterative fitting model for fitting, obtaining a fitting result according to an evaluation standard, and updating the peak height and the peak width.
9. A chromatographic overlapping peak processing method according to claim 8, characterized in that: the Nelder-Mead simplex type algorithm is a direct search method for optimizing multidimensional unconstrained problems. The basic idea is to use the search starting point x in the m-dimensional parameter space0And constructing a polyhedron with m +1 linear independent vertexes, determining the next search direction by comparing objective function values of the vertexes, performing heuristic operations of reflection, expansion, contraction and compression side length on the polygon, and replacing the worst point with a better new vertex to form a new polyhedron. And continuously adjusting the parameter values in such a way, and finally approaching the optimal solution of the target function.
The algorithm comprises the following steps:
first, a function is constructed that calculates the mean square error of the model and the original signal, and if the fitting error is greater than the desired fitting accuracy, the program systematically changes the parameters and loops to the previous step and repeats until the desired fitting accuracy is achieved or the maximum number or iteration is achieved.
I) constructing a parameter estimation function.
Constructing a mean square error function of a calculation model and an original signal:
Figure FDA0002418450290000051
the following description is given of ymodelThe calculation process of (2):
first of all a gaussian matrix a is determined,
Figure FDA0002418450290000052
where h is the number of peaks, n is the number of data, g (x λ)i) Is a gaussian function.
Figure FDA0002418450290000061
The relation between the peak height matrix H, the Gaussian matrix A and the signal data matrix Y is H ═ abs (A \ Y)T) And the result H is an approximation, then there is ymodel=A*H。
II) setting of the termination conditions
Setting the termination error threshold value of the parameter to be fitted to be 0.0000001 if lambdaii-1If the absolute value of the parameter is more than or equal to 0.0000001, replacing the estimation value of the unknown parameter according to the III) simplex iteration process, and carrying out the next iteration until lambda is reachedii-1< 0-0000001, terminate the iteration, otherwise the iteration stops 1000 times.
The error calculation formula is as follows:
Figure FDA0002418450290000062
referred to as Mean Fit Error (MFE), i.e., minimum fit error.
III) iterative procedure
Using a simplex consisting of n +1 points of the n-dimensional vector x, the algorithm first sets an initial estimate x0For each part x0(i) Increase by 5% to the corresponding x0In, will divide x0The n-dimensional vector outside as the initial simplex (if x)0(i) 0, then 0.00025 is used as component i), the algorithm iterates through the simplex iteratively according to the following steps:
(1) let x (i) denote the current simplex point data list, i 1.
(2) Sorting the simple type vertexes from small to large according to function values, wherein the shapes are as follows: f (x (1)) <. < f (x (n +1)), at each step of the iteration the algorithm discards the current worst pastry x (n +1), receives another point as a simplex point (or, in the following step 7, it changes all n points greater than f (x (1)).
(3) Generating a reflection point: r-2 m-x (n +1), wherein,
Figure FDA0002418450290000071
calculate f (r).
(4) If f (x (1)) ≦ f (r) < f (x (n)), accepting r, and terminating the iteration. Known as reflection
(5) If f (r) < f (x (1)), the expansion point s is calculated, s ═ m +2(m-x (n +1)), and f(s):
a. if f(s) < f (r), accept s, terminate iteration, called expand (extended)
b. Otherwise, accept r, terminate iteration, reflect.
(6) If f (r) ≧ f (x (n)), compression processing is performed between m and the better of x (n +1) and r:
a. if f (r) < f (x (n +1)), (e.g., r is better than x (n +1)), c ═ m + (r-m)/2 is calculated, and f (c) is calculated. If f (c) < f (r), accept c, terminate the iteration, called the extract outside. Otherwise, step (7) (shrink) is performed.
b. If f (r) ≧ f (x (n +1)), calculate cc ═ m + (x (n +1) -m)/2, and calculate f (cc), if f (cc) < f (x (n +1)), accept cc, terminate the iteration, contact inside, otherwise, go to step (7).
Calculate these n points:
Figure FDA0002418450290000072
calculate f (v (i)), i ═ 2. The next iteration of simplex is x (1), v (2), v (n +1), called shrink (shrink).
CN202010198383.3A 2020-03-19 2020-03-19 Overlapped peak processing method applied to chromatographic analysis Active CN111337606B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010198383.3A CN111337606B (en) 2020-03-19 2020-03-19 Overlapped peak processing method applied to chromatographic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010198383.3A CN111337606B (en) 2020-03-19 2020-03-19 Overlapped peak processing method applied to chromatographic analysis

Publications (2)

Publication Number Publication Date
CN111337606A true CN111337606A (en) 2020-06-26
CN111337606B CN111337606B (en) 2023-03-31

Family

ID=71184173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010198383.3A Active CN111337606B (en) 2020-03-19 2020-03-19 Overlapped peak processing method applied to chromatographic analysis

Country Status (1)

Country Link
CN (1) CN111337606B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112444589A (en) * 2020-12-04 2021-03-05 深圳普门科技股份有限公司 Chromatographic peak detection method, device, computer equipment and storage medium
CN112464794A (en) * 2020-11-25 2021-03-09 易方达基金管理有限公司 Image-based fluctuation trend identification method and device, computer equipment and medium
CN112557576A (en) * 2020-12-04 2021-03-26 陕西省石油化工研究设计院 Method for measuring content of calcium and magnesium ions in industrial circulating water
CN112731234A (en) * 2020-12-29 2021-04-30 厦门大学 Nuclear magnetic resonance spectrum baseline correction method based on arc tangent model
CN112820358A (en) * 2020-12-28 2021-05-18 上海交通大学 Molten salt electrolytic refining overlapping peak separation method and system based on genetic algorithm
CN112903883A (en) * 2021-04-02 2021-06-04 江苏乐尔环境科技股份有限公司 Spectral peak analysis method and device applied to gas chromatography
CN113419020A (en) * 2021-06-30 2021-09-21 成都师范学院 Glycated hemoglobin overlapping peak recognition method, apparatus, system, device, and medium
CN113567603A (en) * 2021-07-22 2021-10-29 华谱科仪(大连)科技有限公司 Detection and analysis method of chromatographic spectrogram and electronic equipment
CN113607867A (en) * 2021-07-23 2021-11-05 清华大学合肥公共安全研究院 Dual-fold-spectrum peak analysis method based on peak body mapping
US20220034921A1 (en) * 2020-07-29 2022-02-03 Samsung Electronics Co., Ltd. Method and system for searching for synthesis condition
CN115345208A (en) * 2022-10-19 2022-11-15 成都理工大学 Neutron-gamma pulse accumulation discrimination method based on top-hat conversion
CN116973563A (en) * 2023-09-22 2023-10-31 宁波奥丞生物科技有限公司 Immunofluorescence chromatography determination method and device based on quadrature phase-locked amplification
CN117785818A (en) * 2024-02-26 2024-03-29 山东惠分仪器有限公司 Gas chromatograph data optimized storage method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1712955A (en) * 2004-06-25 2005-12-28 中国科学院大连化学物理研究所 Precisive measurement for parameter of chromatography spike and area of overlapped peak
CN103076308A (en) * 2011-10-25 2013-05-01 中国科学院沈阳自动化研究所 Laser-induced breakdown spectroscopy overlapped peak resolution method
CN104246502A (en) * 2012-03-19 2014-12-24 米洛万·斯坦科夫 Device for determining at least one analyte capable of being contained in a liquid sample
US20170336370A1 (en) * 2014-09-03 2017-11-23 Shimadzu Corporation Chromatogram data processing method and chromatogram data processing apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1712955A (en) * 2004-06-25 2005-12-28 中国科学院大连化学物理研究所 Precisive measurement for parameter of chromatography spike and area of overlapped peak
CN103076308A (en) * 2011-10-25 2013-05-01 中国科学院沈阳自动化研究所 Laser-induced breakdown spectroscopy overlapped peak resolution method
CN104246502A (en) * 2012-03-19 2014-12-24 米洛万·斯坦科夫 Device for determining at least one analyte capable of being contained in a liquid sample
US20170336370A1 (en) * 2014-09-03 2017-11-23 Shimadzu Corporation Chromatogram data processing method and chromatogram data processing apparatus

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
REZUANA BAI J.: "ECG DE-NOISING TECHNIQUES FOR DETECTION OF ARRHYTHMIA", 《INTERNATIONAL RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY》 *
涂亚飞等: "Nelder-Mead单纯形法在NaI(Tl)γ能谱峰面积求解中的应用", 《核电子学与探测技术》 *
罗伟栋 等: "便携式气相色谱仪中嵌入式谱峰识别算法设计", 《色谱》 *
陈玉新 等: "DML快速色谱仪分析控制软件系统开发", 《录井工程》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220034921A1 (en) * 2020-07-29 2022-02-03 Samsung Electronics Co., Ltd. Method and system for searching for synthesis condition
CN112464794A (en) * 2020-11-25 2021-03-09 易方达基金管理有限公司 Image-based fluctuation trend identification method and device, computer equipment and medium
CN112464794B (en) * 2020-11-25 2024-09-06 易方达基金管理有限公司 Image-based fluctuation trend identification method, device, computer equipment and medium
CN112557576A (en) * 2020-12-04 2021-03-26 陕西省石油化工研究设计院 Method for measuring content of calcium and magnesium ions in industrial circulating water
CN112444589B (en) * 2020-12-04 2021-10-08 深圳普门科技股份有限公司 Chromatographic peak detection method, device, computer equipment and storage medium
CN112444589A (en) * 2020-12-04 2021-03-05 深圳普门科技股份有限公司 Chromatographic peak detection method, device, computer equipment and storage medium
CN112820358A (en) * 2020-12-28 2021-05-18 上海交通大学 Molten salt electrolytic refining overlapping peak separation method and system based on genetic algorithm
CN112820358B (en) * 2020-12-28 2022-04-26 上海交通大学 Molten salt electrolytic refining overlapping peak separation method and system based on genetic algorithm
CN112731234A (en) * 2020-12-29 2021-04-30 厦门大学 Nuclear magnetic resonance spectrum baseline correction method based on arc tangent model
CN112731234B (en) * 2020-12-29 2021-10-29 厦门大学 Nuclear magnetic resonance spectrum baseline correction method based on arc tangent model
CN112903883A (en) * 2021-04-02 2021-06-04 江苏乐尔环境科技股份有限公司 Spectral peak analysis method and device applied to gas chromatography
CN113419020A (en) * 2021-06-30 2021-09-21 成都师范学院 Glycated hemoglobin overlapping peak recognition method, apparatus, system, device, and medium
CN113567603A (en) * 2021-07-22 2021-10-29 华谱科仪(大连)科技有限公司 Detection and analysis method of chromatographic spectrogram and electronic equipment
CN113607867A (en) * 2021-07-23 2021-11-05 清华大学合肥公共安全研究院 Dual-fold-spectrum peak analysis method based on peak body mapping
CN113607867B (en) * 2021-07-23 2024-06-11 清华大学合肥公共安全研究院 Double-overlap spectrum peak analysis method based on peak body mapping
CN115345208A (en) * 2022-10-19 2022-11-15 成都理工大学 Neutron-gamma pulse accumulation discrimination method based on top-hat conversion
CN116973563A (en) * 2023-09-22 2023-10-31 宁波奥丞生物科技有限公司 Immunofluorescence chromatography determination method and device based on quadrature phase-locked amplification
CN116973563B (en) * 2023-09-22 2023-12-19 宁波奥丞生物科技有限公司 Immunofluorescence chromatography determination method and device based on quadrature phase-locked amplification
CN117785818A (en) * 2024-02-26 2024-03-29 山东惠分仪器有限公司 Gas chromatograph data optimized storage method and system
CN117785818B (en) * 2024-02-26 2024-05-10 山东惠分仪器有限公司 Gas chromatograph data optimized storage method and system

Also Published As

Publication number Publication date
CN111337606B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
CN111337606B (en) Overlapped peak processing method applied to chromatographic analysis
US5121443A (en) Neural net system for analyzing chromatographic peaks
EP0395481A2 (en) Method and apparatus for estimation of parameters describing chromatographic peaks
JP6610678B2 (en) Peak detection method and data processing apparatus
Balke Quantitative column liquid chromatography: a survey of chemometric methods
EP1015882B1 (en) Chromatographic pattern analysis system employing chromatographic variability characterization
CN108830253B (en) Screening model establishing method, spectrum screening device and method
CN110084212B (en) Spectral characteristic peak identification and positioning method based on improved sine and cosine algorithm
CN1040798C (en) Absorbance analysis system and data processing method for chromatograph
CN112461805A (en) Method for fluorescence intensity substrate calculation
CN107966420B (en) Method for predicting crude oil property by near infrared spectrum
CN111521577B (en) Infrared spectrum quantitative analysis method taking carbon dioxide peak area as reference
CN115112598B (en) Modeling method, model and detection method of automatic jade detection model
CN109219748A (en) Blob detection method and data processing equipment
CN117491422A (en) Method for detecting high heat-conducting property of aluminum alloy material
CN114707540B (en) Method for determining spectrogram base line
CN111426648B (en) Method and system for determining similarity of infrared spectrogram
CN111579526B (en) Method for representing difference and correction of near infrared instrument
CN113053475B (en) Signal processing and multi-attribute decision method based on micro-cantilever gas sensitive material analysis
CN115574939A (en) Optical fiber spectrometer measuring method
Toft Evolutionary rank analysis applied to multidetectional chromatographic structures
US6210465B1 (en) Method for identification of components within a known sample
CN118465161B (en) Method for detecting minoxidil related substances based on liquid chromatography
CN116660207B (en) Method for determining characteristic spectrum in oil product quick detection and octane content detection system
CN117828334A (en) Method, device, equipment and medium for extracting data characteristics of resistive array sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 226344 Lan Gang Qiao Village, Dongshe Town, Tongzhou District, Nantong City, Jiangsu Province

Patentee after: Jiangsu Le'er Environmental Technology Co.,Ltd.

Address before: 226344 yanggangju, Dongshe Town, Tongzhou District, Nantong City, Jiangsu Province

Patentee before: Le'er Environmental Technology (Jiangsu) Co.,Ltd.

Address after: 226344 yanggangju, Dongshe Town, Tongzhou District, Nantong City, Jiangsu Province

Patentee after: Le'er Environmental Technology (Jiangsu) Co.,Ltd.

Address before: 226000 Yanggang Residence, Dongshe Town, Tongzhou District, Nantong City, Jiangsu Province

Patentee before: NANTONG LEER ENVIRONMENTAL PROTECTION TECHNOLOGY Co.,Ltd.