Disclosure of Invention
The present invention is directed to a method and apparatus for processing overlapping peaks for chromatographic analysis, which solves the above problems in the prior art.
In order to achieve the purpose, the invention provides the following technical scheme: a chromatogram overlapping peak processing device comprises a data acquisition module, a data analysis module, a baseline noise reduction module, a calibration fitting module and an overlapping peak processing module, wherein the data acquisition module is connected with the data analysis module, the data analysis module is connected with the baseline noise reduction module, the baseline noise reduction module is connected with the calibration fitting module, and the marking fitting module is connected with the overlapping peak processing module.
A chromatographic overlapping peak processing method is characterized by comprising the following steps:
s1, finding out indexes of all peak values and inflection points by using a difference method: recording all adjacent non-repeated effective value non-zero values, and utilizing a first-order difference method to solve a derivative. Judging the sign of the derivative, wherein the point where the sign of the derivative changes from positive to negative is a peak point; the point where the sign of the derivative changes is the inflection point; the inflection point includes a peak point
S2, keeping only the index of the peak satisfying the minimum peak height: giving the minimum height MinPeakHeight of the peak to be preserved, and deleting the peak which does not meet the height;
s3, retaining only the index of the peak satisfying the threshold: giving a minimum Threshold Threshold of a peak to be reserved, wherein the Threshold is used for limiting the minimum height of a peak point from the left and right side points of the peak point, and deleting the peak with the minimum height smaller than the Threshold to avoid reserving a flat-top peak;
s4, determining the boundary index of each peak: calculating a left base point, a left saddle point, a right base point and a right saddle point, wherein the maximum y value of the left base point and the right base point is the base point of the peak, taking the left base point as an example, scanning all inflection points, and searching the left base point of the peak leftwards in sequence when the peak stops: marking the current peak-valley point as valley, the corresponding peak-valley point as peak, marking the left-scanned peak-valley point as vi, and the corresponding peak-valley point as pi;
1, if vi > valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
2, if vi is greater than valley and pi is less than or equal to peak, repeating the steps 1-4, and continuously searching a left base point forwards;
3 if vi < valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
4 if vi is less than valley and pi is less than or equal to peak, updating vi as the left base point of the current peak; and repeating the steps 1-4 to continuously search for smaller peak-valley points until a point with a peak value larger than peak is met.
Therefore, the left base points of all the peaks can be found, the left base points of the reserved peaks are stored, when the peak heights of the two peaks are the same, the saddle point is the nearest peak inflection point, and the base point is the minimum peak valley point; in other cases, the saddle point is the same as the base point, and the right base point is similar to the left base point, but the right base point is searched from the right to the left in the same way;
s5, retaining only the index satisfying the minimum protrusion: given the minimum protrusion minpeakpromience, peaks smaller than the minimum protrusion are deleted. And (3) protrusion: the difference between the y coordinate of the peak point and the y coordinate of the highest base point is determined by the inherent height of the peak and the protrusion degree determined by the position relative to other peaks;
s6, calculating the x coordinate of the half-height-width boundary of each peak: determine approximate height of full width at half maximum refHeight: half of the sum of the y coordinate of the peak point and the y coordinate of the base point. Find the left boundary of full width at half maximum: searching the peak point to the left base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point adjacent to the right side of the point and the refHeight point to obtain the x coordinate of a left boundary with half-height width; find the right boundary of full width at half maximum: searching the peak point to the right base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point close to the left side of the point and the refHeight point to obtain the x coordinate of a right boundary with the half-height width;
s7, retaining only peaks within a given width range: giving a minimum peak searching width minW and a maximum peak searching width maxW, if no peak exists or the minimum width is 0 and the maximum width is infinite, not screening, calculating the full width at half maximum by using an x coordinate of a full width at half maximum boundary, and deleting peak values which are not in a given width range;
s8, finding out the index of the maximum peak value in the specified distance: the peaks are sorted from large to small, starting with the larger peak to ensure that we do not accidentally retain a small peak and remove a large peak in its vicinity. If a certain peak value is not near to a larger peak value, finding a secondary peak value within a set distance range of the peak value for elimination, and performing the following loop operation for each effective peak:
marking the peak value within the set distance range of the effective peak value index minD as 1 to indicate that the peak value is deleted, otherwise marking 0 to indicate that the peak value is not deleted, circulating next time, searching around the marking 0 without scanning the peak value of the marking 1, and keeping the peak value index of the marking 0;
after all the effective peaks needing to be reserved are determined, the effective peaks correspond to the original peak value index sequence again;
s9, sorting the peak values, limiting the number of the peaks: inputting the number numP of peaks to be detected, sorting the detected peaks from large to small, and taking out the first numP peaks as the result;
s10, returning: and returning information such as peak height, peak position, peak width, protrusion, coordinates of left and right base points and the like corresponding to the index value.
Preferably, the data acquisition module 1 acquires a voltage signal of the FID gas chromatography detection system, and the chromatographic signal is amplified by hardware filtering, and the signal value is sent to the PC end by the analog-to-digital converter and the singlechip control system.
Preferably, the baseline noise reduction module 3 performs noise reduction processing on the acquired chromatographic data, and performs real-time data filtering by combining an SG filter and a morphological filter.
Preferably, the calibration fitting module 4 calibrates the concentration of the chromatographic analyzer by using a standard sample gas sample, stores the calibration result in a configuration file, and performs linear transformation by referring to the calibration table in the specific calculation of the concentration of each component by using a linear interpolation method.
Preferably, the overlapped peak processing module 5 is used for separating the chromatographic data by separating overlapped peaks, the chromatography works according to the slight difference of the physical and chemical properties of substances, when the physical and chemical properties of two components are similar, the effluent chromatographic curves are close to each other on a time axis, so that the obtained chromatographic peaks are overlapped, and the module is used for separating the chromatographic peaks so as to obtain the substance types and concentrations of the components.
Preferably, the steps 1 through S10 may cause fluctuations or drifts in the baseline as the flow and temperature settings change, assuming that the gas chromatograph is approaching a constant temperature and the components are in a steady state with a relatively smooth baseline after a sufficient amount of time has elapsed since the last change in operating conditions.
Preferably, the base point peak-valley point on the baseline is the actual boundary of the peak, and the base point peak-valley point of the baseline above a certain threshold is the intersection of overlapping peaks. The peak with the cross points is definitely an overlapped peak, and the number of the cross points plus 1 is the number of the sub-peaks in the overlapped peak. The nearest non-intersection points around the intersection point are the left and right boundary points of the overlapping peak. Thus, the overlapping peak range can be determined by using the left and right boundary points, and fitting is performed.
The algorithm comprises the following steps:
g) and performing de-duplication ascending operation on the base points obtained by the previous part of derivative peak searching method by taking the x coordinate as a reference to obtain the peak-valley point coordinates of the ordered base points of the signal data.
h) According to the actual data situation, given a threshold value threshold _ y in the y-axis direction, points higher than the threshold value are intersection points, and points lower than the threshold value are boundary points. If the threshold is too small, the peak with better separation may be judged as an overlapping peak, and if the threshold is too large, some overlapping peaks with large separation degree may be ignored, so as to give up the viewing requirement.
i) And sequentially scanning the ordered base points, recording the previous boundary point as the left boundary point i _ left of the overlapped peak when an intersection is encountered, continuing to scan, recording the next boundary point as the right boundary point i _ right of the overlapped peak when the next boundary point is encountered, and recording the number of the intersection points plus one as the number i _ nump of the sub-peaks in the overlapped peak. And continuing to store the left and right boundary points and the number of sub-peaks of all the overlapped peaks.
j) The fitting center and the fitting range window are calculated.
window=baseX(i_Tight)-baseX(i_left)
In the formula, baseX (i _ left) and baseX (i _ Right) are x coordinates of left and right base points of the overlapping peak, respectively.
k) The initial value start of the parameter to be fitted is determined. I.e. the peak position and width of each peak in the overlapping peaks, are calculated by the derivative peak-finding method of the previous part.
l) fitting for each overlapping peak:
inputting parameters such as chromatographic signal data y, a fitting center, a fitting window, the number numP of peaks to be fitted, an initial value start of a parameter to be fitted and the like into an N-M simplex type iterative fitting model for fitting, obtaining a fitting result according to an evaluation standard, and updating the peak height and the peak width.
Preferably, the Nelder-Mead simplex type algorithm is a direct search method for optimizing multidimensional unconstrained problems.
Basic idea thereofFirstly using a search starting point x in an m-dimensional parameter space0And constructing a polyhedron with m +1 linear independent vertexes, determining the next search direction by comparing objective function values of the vertexes, performing heuristic operations of reflection, expansion, contraction and compression side length on the polygon, and replacing the worst point with a better new vertex to form a new polyhedron. And continuously adjusting the parameter values in such a way, and finally approaching the optimal solution of the target function.
The algorithm comprises the following steps:
first, a function is constructed that calculates the mean square error of the model and the original signal, and if the fitting error is greater than the desired fitting accuracy, the program systematically changes the parameters and loops to the previous step and repeats until the desired fitting accuracy is achieved or the maximum number or iteration is achieved.
And I, constructing a parameter estimation function.
Constructing a mean square error function of a calculation model and an original signal:
the following description is given of ymodlThe calculation process of (2):
first of all a gaussian matrix a is determined,
where h is the number of peaks, n is the number of data, g (x λ)i) Is a gaussian function.
The relation between the peak height matrix H, the Gaussian matrix A and the signal data matrix Y is H ═ abs (A \ Y)T) And the result H is an approximation, then there is ymodel=A*H。
II setting of end conditions
Setting the termination error threshold of the parameter lambda to be fitted to be 0.0000001 if lambdai-λi-1≥0.0000001,Replacing the estimated value of the unknown parameter according to the III simplex iteration process, and carrying out the next iteration until lambdai-λi-1If the number of iterations is less than 0.0000001, the iteration is terminated, otherwise, the iteration stops 1000 times.
The error calculation formula is as follows:
referred to as mean fit error, MeanFitError, MFE, i.e., minimum fit error.
III iterative Process
Using a simplex consisting of n +1 points of the n-dimensional vector x, the algorithm first sets an initial estimate
x0For each part x0(i) Increase by 5% to the corresponding x0In, will divide x0The n-dimensional vector outside as the initial simplex if x0(i) Using 0.00025 as component i, the algorithm iterates through the simplex iteratively according to the following steps:
(7) let x (i) denote the current simplex point data list, i 1.
(8) Sorting the simple type vertexes from small to large according to function values, wherein the shapes are as follows: f (x (1)) <. < f (x (n +1)), at each step of the iteration the algorithm discards the current worst pastry x (n +1), receives another point as a simplex point or, in the following step 7, it changes all n points greater than f (x (1)).
(9) Generating a reflection point: r-2 m-x (n +1), wherein,
f (τ) is calculated.
(10) If f (x (1)) ≦ f (τ) < f (x (n)), accepting r, and terminating the iteration. Known as reflection
(11) If f (τ) < f (x (1)), the expansion point s is calculated, s ═ m +2(m-x (n +1)), and f(s):
c. if f(s) < f (τ), accept s, terminate the iteration, called extended
d. Otherwise, accept r, terminate iteration, reflect.
(12) If f (tau) ≧ f (x (n)), compression processing is performed between m and the better of x (n +1) and r:
c. if f (τ) < f (x (n +1)), (e.g., r is better than x (n +1)), c ═ m + (τ -m)/2 is calculated, and f (c) is calculated. If f (c) < f (τ),
accept c, terminate the iteration, called the extract outside. Otherwise, step 7 shrink is performed.
d. If f (τ) ≧ f (x (n +1)), cc ═ m + (x (n +1) -m)/2 is calculated, and f (cc) is calculated, if
f (cc)) < f (x (n +1)), accept cc, terminate the iteration, extract, otherwise, go to step 7.
Calculate these n points:
calculate f (v (i)), i ═ 2. The next iteration of simplex is x (1), x (2),.., v (n +1), called shrink.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention adopts a data acquisition, baseline noise reduction, calibration fitting and overlapping peak processing module; the method is characterized in that a core module is an overlapping peak processing module, the method comprises the steps of firstly determining the peak position by adopting a derivative method, then determining the overlapping peak by grouping base points, and fitting a peak shape curve by using a simplex method, thereby determining the information such as the peak height, the peak width, the protrusion and the like of each component of the overlapping peak, the method is high in accuracy of determining the peak position, high in accuracy of determining the information of each component of the overlapping peak, can automatically define the peak separating interval, and can automatically determine the number of sub-peaks in the overlapping peak, and has the advantages of accurately determining the peak position and the information of each component of the overlapping peak;
2. the method has the advantages that the concept is simple, differentiation is not needed, the method is not sensitive to the initial value, and each iteration only needs no more than 2 times of function evaluation, so the searching speed is high, and the like.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution: a chromatogram overlapping peak processing device comprises a data acquisition module 1, a data analysis module 2, a baseline noise reduction module 3, a calibration fitting module 4 and an overlapping peak processing module 5, wherein the data acquisition module 1 is connected with the data analysis module 2, the data analysis module 2 is connected with the baseline noise reduction module 3, the baseline noise reduction module 3 is connected with the calibration fitting module 4, and the marking line fitting module is connected with the overlapping peak processing module 5.
A chromatographic overlapping peak processing method specifically comprises the following steps:
s1, finding out indexes of all peak values and inflection points by using a difference method: recording all adjacent non-repeated effective value non-zero values, and utilizing a first-order difference method to solve a derivative. Judging the sign of the derivative, wherein the point where the sign of the derivative changes from positive to negative is a peak point; the point where the sign of the derivative changes is the inflection point; the inflection point includes a peak point
S2, keeping only the index of the peak satisfying the minimum peak height: giving the minimum height MinPeakHeight of the peak to be preserved, and deleting the peak which does not meet the height;
s3, retaining only the index of the peak satisfying the threshold: giving a minimum Threshold Threshold of a peak to be reserved, wherein the Threshold is used for limiting the minimum height of a peak point from the left and right side points of the peak point, and deleting the peak with the minimum height smaller than the Threshold to avoid reserving a flat-top peak;
s4, determining the boundary index of each peak: calculating a left base point, a left saddle point, a right base point and a right saddle point, wherein the maximum y value of the left base point and the right base point is the base point of the peak, taking the left base point as an example, scanning all inflection points, and searching the left base point of the peak leftwards in sequence when the peak stops: marking the current peak-valley point as valley, the corresponding peak-valley point as peak, marking the left-scanned peak-valley point as vi, and the corresponding peak-valley point as pi;
1, if vi > valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
2, if vi is greater than valley and pi is less than or equal to peak, repeating the steps 1-4, and continuously searching a left base point forwards;
3 if vi < valley and pi > peak, recording valley as a left base point of the current peak; repeating the steps 1-4 to continuously search a left base point of the next peak;
4 if vi is less than valley and pi is less than or equal to peak, updating vi as the left base point of the current peak; and repeating the steps 1-4 to continuously search for smaller peak-valley points until a point with a peak value larger than peak is met.
Therefore, the left base points of all the peaks can be found, the left base points of the reserved peaks are stored, when the peak heights of the two peaks are the same, the saddle point is the nearest peak inflection point, and the base point is the minimum peak valley point; in other cases, the saddle point is the same as the base point, and the right base point is similar to the left base point, but the right base point is searched from the right to the left in the same way;
s5, retaining only the index satisfying the minimum protrusion: given the minimum protrusion minpeakpromience, peaks smaller than the minimum protrusion are deleted. And (3) protrusion: the difference between the y coordinate of the peak point and the y coordinate of the highest base point is determined by the inherent height of the peak and the protrusion degree determined by the position relative to other peaks;
s6, calculating the x coordinate of the half-height-width boundary of each peak: determine approximate height of full width at half maximum refHeight: half of the sum of the y coordinate of the peak point and the y coordinate of the base point. Find the left boundary of full width at half maximum: searching the peak point to the left base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point adjacent to the right side of the point and the refHeight point to obtain the x coordinate of a left boundary with half-height width; find the right boundary of full width at half maximum: searching the peak point to the right base point of the peak along the curve until finding a point with the y value closest to and greater than refHeight, recording the x coordinate of the point, and interpolating by utilizing the point and a point close to the left side of the point and the refHeight point to obtain the x coordinate of a right boundary with the half-height width;
s7, retaining only peaks within a given width range: giving a minimum peak searching width minW and a maximum peak searching width maxW, if no peak exists or the minimum width is 0 and the maximum width is infinite, not screening, calculating the full width at half maximum by using an x coordinate of a full width at half maximum boundary, and deleting peak values which are not in a given width range;
s8, finding out the index of the maximum peak value in the specified distance: the peaks are sorted from large to small, starting with the larger peak to ensure that we do not accidentally retain a small peak and remove a large peak in its vicinity. If a certain peak value is not near to a larger peak value, finding a secondary peak value within a set distance range of the peak value for elimination, and performing the following loop operation for each effective peak:
marking the peak value within the set distance range of the effective peak value index minD as 1 to indicate that the peak value is deleted, otherwise marking 0 to indicate that the peak value is not deleted, circulating next time, searching around the marking 0 without scanning the peak value of the marking 1, and keeping the peak value index of the marking 0;
after all the effective peaks needing to be reserved are determined, the effective peaks correspond to the original peak value index sequence again;
s9, sorting the peak values, limiting the number of the peaks: inputting the number numP of peaks to be detected, sorting the detected peaks from large to small, and taking out the first numP peaks as the result;
s10, returning: and returning information such as peak height, peak position, peak width, protrusion, coordinates of left and right base points and the like corresponding to the index value.
In the invention, a data acquisition module 1 acquires a voltage signal of an FID gas chromatography detection system, the chromatographic signal is filtered and amplified by hardware, and a signal value is sent to a PC end by an analog-to-digital converter and a singlechip control system;
in the invention, the baseline noise reduction module 3 filters the acquired voltage signals by denoising the acquired chromatographic data, combining an SG filter and a morphological filter for real-time data filtering, and combining the SG filter with a filtering window of 19 and the morphological filters with structural elements of g1 and g 2;
in the invention, a calibration fitting module 4 calibrates the concentration of a chromatographic analyzer by a standard sample gas sample, stores the calibration result in a configuration file, performs linear conversion by a linear interpolation method according to a calibration table when the concentration of each component is specifically calculated, firstly introduces standard gas with known concentration components as benzene series, calibrates the components and the concentration of each peak by observing a spectrogram curve, calibrates the components and the concentration of each peak two to three times, and fits the relationship between the peak height and the concentration of each substance to obtain a fitting curve
Y ═ a × X + b, where Y is concentration and X is peak height. And substituting the peak height into a curve equation to obtain the measured real-time concentration.
Two species were measured separately for two experiments:
benzene: the nominal concentration 50, nominal height 56.593,
calibration concentration 40, calibration height 45.274
Calibration curve y-0 +0.8835 x
Toluene: the nominal concentration 60, nominal height 61.516,
nominal concentration 50, nominal height 51.261
The calibration relation y is 0+0.9754 x;
in the invention, the overlapped peak processing module 5 separates overlapped peaks of chromatographic data, the chromatographic method works according to the tiny difference of physical and chemical properties of substances, when the physical and chemical properties of two components are similar, the effluent chromatographic curves are very close on a time axis, so that the obtained chromatographic peaks are overlapped, the module is used for separating the chromatographic peaks, so that the substance types and the concentrations of the components are obtained, and accurate real signals are obtained after filtering, so that component analysis can be carried out. The component analysis is to analyze the components and concentrations of the substances to be detected based on the peak appearance of the spectrum.
The performance of the overlapping peak separation method is evaluated from three aspects of an intuitive separation effect graph, an evaluation standard of a fitting effect and separation errors under different separation degrees; and finally the performance of the whole device is verified through experiments.
First, a calculation method of each index is explained:
with the integration concept, the real area is the sum of all y values of the Gaussian peaks, and the x-axis has the unit of 1.
area error is fit area-true area
Note that it is different from the separation error.
wherein, N is the data number, x is the data vector of the signal with noise, and y is the data vector of the pure signal. Only the signal-to-noise ratio of the fitting segment data is calculated herein.
Relative Standard Deviation (RSD):
where S is the standard deviation (which may also be expressed as SD),
are the corresponding average values.
Square of correlation coefficient (R2):
a. and a variance (SSE) of square products to error, The statistical parameter being calculated as The sum of The squares of The errors of The corresponding points of The fitting data and The original data, The calculation formula being as follows:
the closer the SSE is to 0, the better the model selection and fitting, and the more successful the data prediction. The following MSE and RMSE, since SSE is the same,
the effect is the same.
SST: total sum of squares, the sum of the squares of the differences between the raw data and the mean, is given by the formula:
c. square of correlation coefficient
In essence, the "coefficient of certainty" characterizes how well a fit is by the change in data. From the above expression, it can be known that the normal value range of the "determination coefficient" is [0,1], and the closer to 1, the stronger the explanatory ability of the variable of the equation to y is, and the better the model fits the data.
The experimental procedure is described below:
1) evaluation of fitting Effect
The experiment simulates 10 different overlapped peaks, wherein the overlapped peaks consist of Gaussian peaks with the width of 10-50 and the peak position of 70-120, and Gaussian noise with the floating range of 0.003-0.05 and baseline noise with the slope of 0.001 are added. The results of the experiments are shown in the following table:
table 110 comparison table of fitting effect of overlapping peaks
As can be seen from the table above, for the signal data with the average signal-to-noise ratio of 72.64, the fitting error is less than 0.6%, and the correlation coefficient is as high as 0.999 or more, indicating that the fitting effect is good.
In order to prove the stability of the algorithm, the method also performs the following experiments: and 6 kinds of noises with different signal-to-noise ratios are added to the same analog data (formed by overlapping two Gaussian peaks with the widths of 10 and 20 respectively) to verify the fitting effect. The results of the experiments are shown in the following table:
TABLE 2 comparison table of fitting effect of overlapping peaks of the same sample at different times
As can be seen from the above table, in the process that the signal-to-noise ratio is reduced from 94.58 to 53.53, the fitting error of the signal data is less than 0.7% (mean square error is 0.18), and the correlation coefficient is as high as more than 0.999 (mean square error is 0.0001), which indicates that the fitting method is stable.
2) Separation error
The separation degree is used as an index of the separation efficiency of the chromatographic column, the separation condition of two adjacent components in the chromatographic column can be judged, and the larger the separation degree R is, the better the separation of the two adjacent components is. Generally, when the degree of separation R <1, the two peaks partially overlap.
The resolution calculation formula is as follows:
in the formula: the retention times of the 2 single peaks constituting the overlapping peak, respectively, and the peak bottom widths of the 2 single peaks, respectively.
Four kinds of simulation data with the separation degrees of 0.50, 0.67, 0.84 and 1.01 are designed in the experiment, and Gaussian noise and baseline noise with the same floating range are added. The 4 groups of data are tested, and the separation error is calculated by the following formula:
in the formula: sXIs the area of the X peak.
The results of the experiment are reported below:
TABLE 3 separation error under different degrees of separation
As can be seen from the above table, the separation error of the signal data with the average signal-to-noise ratio of 68.11 is within the range of + -1.6.
3) Graph of separation effect
The separation effect can be seen visually from the separation effect graph, three different overlapped peaks are selected for processing in the experiment, and the separation effect is shown in attached figures 3, 4 and 5.
4) Device performance
The device is mainly used for separating overlapped peaks and measuring components and concentrations of all components, so that the measurement performance is the relative standard deviation and repeatability of each measured component.
Because the properties of the benzene series are similar and overlapping peaks are easy to generate, the benzene series is taken as an experimental object in the experiment, and the measured gas is diluted benzene series standard gas. The following table shows the benzene analysis results in 9 experiments:
TABLE 4 test results of benzene series
Generally, the change of the continuous sampling retention time in the measurement of the sample gas with the same concentration needs to meet the requirement of less than +/-1%, and the change of the peak height and the peak area needs to meet the requirement of less than +/-3%. As can be seen from the above table, the relative standard deviations of retention time, peak height and peak area measured after 9 times of data measurement are only 0.24%, 1.24% and 1.72%, respectively, so that qualitative repeatability analysis using retention time as a standard meets the requirement of ± 1%, and meanwhile, quantitative repeatability analysis using peak height and peak area as standards meets the requirement of less than ± 3%;
in the present invention, fluctuations or drifts in the baseline may occur when the flow and temperature settings are changed in steps 1 through S10, assuming that the gas chromatograph is gradually brought close to a constant temperature, the components are in a steady state, and the baseline is relatively smooth, after a sufficiently long time has elapsed since the operating conditions were last changed;
in the present invention, the base point peak-valley point on the base line is the actual boundary of the peak, and the base point peak-valley point of the base line higher than a certain threshold value is the intersection of the overlapping peaks. The peak with the cross points is definitely an overlapped peak, and the number of the cross points plus 1 is the number of the sub-peaks in the overlapped peak. The nearest non-intersection points around the intersection point are the left and right boundary points of the overlapping peak. Thus, the overlapping peak range can be determined by using the left and right boundary points, and fitting is performed.
The algorithm comprises the following steps:
m) performing de-weight ascending operation on the base point obtained by the previous part of derivative peak searching method by taking the x coordinate as a reference to obtain the peak-valley point coordinate of the ordered base point of the signal data.
n) according to the actual data situation, a threshold value threshold _ y in the y-axis direction is given, points higher than the threshold value are intersection points, and points lower than the threshold value are boundary points. If the threshold is too small, the peak with better separation may be judged as an overlapping peak, and if the threshold is too large, some overlapping peaks with large separation degree may be ignored, so as to give up the viewing requirement.
o) sequentially scanning the ordered base points, recording the previous boundary point as the left boundary point i _ left of the overlapped peak when an intersection is encountered, continuing to scan, recording the next boundary point as the right boundary point i _ right of the overlapped peak when the next boundary point is encountered, and recording the number of the intersection points plus one as the number i _ nump of the sub-peaks in the overlapped peak. And continuing to store the left and right boundary points and the number of sub-peaks of all the overlapped peaks.
p) calculating the fitting center and the fitting range window.
window=baseX(i_Tight)-baseX(i_left)
In the formula, baseX (i _ left) and baseX (i _ Right) are x coordinates of left and right base points of the overlapping peak, respectively.
q) determining an initial value start of the parameter to be fitted. I.e. the peak position and width of each peak in the overlapping peaks, are calculated by the derivative peak-finding method of the previous part.
r) fitting to each overlapping peak:
inputting parameters such as chromatographic signal data y, a fitting center, a fitting window, the number numP of peaks to be fitted, an initial value start of a parameter to be fitted and the like into an N-M simplex type iterative fitting model for fitting, obtaining a fitting result according to an evaluation standard, and updating the peak height and the peak width;
in the invention, a Nelder-Mead simplex algorithm is a direct search method for optimizing a multidimensional unconstrained problem.
The basic idea is to use the search starting point x in the m-dimensional parameter space0And constructing a polyhedron with m +1 linear independent vertexes, determining the next search direction by comparing objective function values of the vertexes, performing heuristic operations of reflection, expansion, contraction and compression side length on the polygon, and replacing the worst point with a better new vertex to form a new polyhedron. And continuously adjusting the parameter values in such a way, and finally approaching the optimal solution of the target function.
The algorithm comprises the following steps:
first, a function is constructed that calculates the mean square error of the model and the original signal, and if the fitting error is greater than the desired fitting accuracy, the program systematically changes the parameters and loops to the previous step and repeats until the desired fitting accuracy is achieved or the maximum number or iteration is achieved.
And I, constructing a parameter estimation function.
Constructing a mean square error function of a calculation model and an original signal:
the following description is given of ymodelThe calculation process of (2):
first of all a gaussian matrix a is determined,
where h is the number of peaks, n is the number of data, g (x λ)i) Is a gaussian function.
The relation between the peak height matrix H, the Gaussian matrix A and the signal data matrix Y is H ═ abs (A \ Y)T) And the result H is an approximation, then there is ymodel=A*H。
II setting of end conditions
Setting the termination error threshold of the parameter lambda to be fitted to be 0.0000001 if lambdai-λi-1If the absolute value of the parameter is more than or equal to 0.0000001, replacing the estimated value of the unknown parameter according to the III simplex iteration process, and carrying out the next iteration until lambda is reachedi-λi-1If the number of iterations is less than 0.0000001, the iteration is terminated, otherwise, the iteration stops 1000 times.
The error calculation formula is as follows:
referred to as mean fit error, MeanFitError, MFE, i.e., minimum fit error.
III iterative Process
Using a simplex consisting of n +1 points of an n-dimensional vector x, the algorithm first sets an initial estimate x0For each part x0(i) Increase by 5% to the corresponding x0In, will divide x0The n-dimensional vector outside as the initial simplex if x0(i) Using 0.00025 as component i, the algorithm iterates through the simplex iteratively according to the following steps:
(13) let x (i) denote the current simplex point data list, i 1.
(14) Sorting the simple type vertexes from small to large according to function values, wherein the shapes are as follows: f (x (1)) <. < f (x (n +1)), at each step of the iteration the algorithm discards the current worst pastry x (n +1), receives another point as a simplex point or, in the following step 7, it changes all n points greater than f (x (1)).
(15) Generating a reflection point: r-2 m-x (n +1), wherein,
f (τ) is calculated.
(16) If f (x (1)) ≦ f (τ) < f (x (n)), accepting r, and terminating the iteration. Known as reflection
(17) If f (τ) < f (x (1)), the expansion point s is calculated, s ═ m +2(m-x (n +1)), and f(s):
e. if f(s) < f (τ), accept s, terminate the iteration, called extended
f. Otherwise, accept r, terminate iteration, reflect.
(18) If f (tau) ≧ f (x (n)), compression processing is performed between m and the better of x (n +1) and r:
e. if f (τ) < f (x (n +1)), (e.g., r is better than x (n +1)), c ═ m + (τ -m)/2 is calculated, and f (c) is calculated. If f (c) < f (τ), c is accepted, terminating the iteration, called the extract output. Otherwise, step 7 shrink is performed.
f. If f (τ) ≧ f (x (n +1)), calculate cc ═ m + (x (n +1) -m)/2, and calculate f (cc), if f (cc) < f (x (n +1)), accept cc, terminate the iteration, contact inside, otherwise, go to step 7.
Calculate these n points:
calculate f (v (i)), i ═ 2. The next iteration of simplex is x (1), v (2),. v (n +1), called shrink.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.