CN114676792A

CN114676792A - Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm

Info

Publication number: CN114676792A
Application number: CN202210385752.9A
Authority: CN
Inventors: 杜文莉; 赵云蒙; 何仁初; 钟伟民; 杨明磊
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-06-28

Abstract

The invention relates to the technical field of near-infrared modeling data processing, in particular to a near-infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm. The invention comprises the following steps: step S1, acquiring near infrared spectrum x_valSample and corresponding physicochemical property value y_valAs a sample set; step S2, dividing the sample set into a correction set and a verification set, and calculating an average spectrum x_avg(ii) a Step S3, respectively aligning the near infrared spectrum x_valAnd average spectrum x_avgPreprocessing the spectrum matrix X_valAnd averageSpectral matrix X_avg(ii) a Step S4, spectrum matrix X is subjected to Gaussian random projection algorithm_valCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix X_valRed(ii) a Step S5, establishing an artificial neural network prediction model; step S6, checking the model by adopting a verification set; and step S7, carrying out quantitative analysis on the input near infrared spectrum, and outputting a corresponding physicochemical property predicted value. The invention does not need to select the wavelength of the spectrum, reduces the modeling difficulty and shortens the modeling time.

Description

Near infrared spectrum quantitative analysis dimensionality reduction method and system based on stochastic projection algorithm

Technical Field

The invention relates to the technical field of near-infrared modeling data processing, in particular to a near-infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm.

Background

The near-infrared analysis technology is a method for analyzing the absorption characteristics of a near-infrared spectrum region according to a certain chemical component in a detected sample, and qualitatively and quantitatively analyzes the sample by means of a chemometrics multivariate correction method and the slight difference of spectral information among the samples. Due to the influence of unstable factors such as noise of a spectral instrument, external environment variation and the like, the signal-to-noise ratio of certain wave bands of the near infrared spectrum is low, the spectral quality is poor, the wave bands can cause instability of the model, multiple correlations exist among the spectral wavelengths of the sample, redundant information exists in the spectral information, and the calculation of the near infrared analysis model is complex.

Therefore, wavelength selection is often required when building near-infrared analysis models. At present, the methods for selecting the wavelength include a correlation coefficient method, a genetic algorithm, a simulated annealing algorithm, an interval partial least square method and the like.

However, the process of wavelength selection is very cumbersome and the most time-consuming and laborious process before modeling.

At present, the common methods for near infrared modeling mainly include multiple linear regression, partial least square method, artificial neural network, support vector machine and the like.

The partial least square method is one of the most common modeling methods for near infrared modeling, can effectively reduce the dimension, extract the effective information of an independent variable matrix, reflect the linear relation between the near infrared spectrum wave number and the oil attribute to be analyzed, and is reliable and accurate in modeling. However, the partial least squares method cannot effectively reflect the nonlinear relationship between the near infrared spectrum and the properties of the oil to be analyzed.

Therefore, there is a need for improvement to solve the above-mentioned deficiencies of the existing near infrared spectroscopy quantitative analysis technology.

Disclosure of Invention

The invention aims to provide a near infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm, and solves the problem that in the prior art, wavelength selection is needed for near infrared analysis, so that time and labor are wasted.

In order to achieve the aim, the invention provides a near infrared spectrum quantitative analysis dimension reduction method based on a stochastic projection algorithm, which comprises the following steps:

step S1, acquiring near infrared spectrum x_valSample and corresponding physicochemical property value y_valAs a sample set;

step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction set_valCalculating the average spectrum x_avg；

Step S3, near infrared spectrum x of correction set_valPreprocessing the spectrum matrix X_valAverage spectrum x for the calibration set_avgPreprocessing to obtain an average spectrum matrix X_avg；

Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction set_valCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix X_valRed；

Step S5, based on the reduced dimension spectrum matrix X_valRedEstablishing an artificial neural network prediction model;

step S6, checking the artificial neural network prediction model established in the step S5 by adopting a verification set;

and S7, carrying out quantitative analysis on the input near infrared spectrum based on the artificial neural network prediction model checked in the step S6, and outputting a corresponding physicochemical property predicted value.

In one embodiment, in the step S2, the average spectrum x_avgThe corresponding expression is:

wherein n is the number of near infrared spectrums, x_valiIs the ith spectrum.

In an embodiment, in step S2, the dividing the sample set into a correction set and a verification set further includes:

and selecting m spectra from the sample set as a correction set and using the rest samples as a verification set by adopting a K-S algorithm based on Euclidean distance or an SPXY algorithm based on property variables.

In an embodiment, the preprocessing manner in step S3 includes: first derivative, second derivative and max-min normalization.

In one embodiment, the step S3 further includes:

for near infrared spectrum x_valSimultaneously carrying out various pretreatments to obtain a spectrum matrix X_val；

For average spectrum x_avgSimultaneously carrying out various pretreatments to obtain an average spectrum matrix X_avg。

In an embodiment, the step S4, further includes:

step S41, according to the average spectrum matrix X_avgObtaining a Gaussian random projection transition matrix P;

step S42, based on the transition matrix P of Gaussian random projection, for the spectral matrix X_valCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix X_valRed。

In an embodiment, the step S41, further includes:

averaging the spectral matrix X of p wavelength points_avgCarrying out random dimensionality reduction projection to obtain a dimensionality reduced average spectrum matrix X of q wavelength points_avgRed；

According to the expression X_avgRed＝P*X_avgAnd solving a Gaussian random projection transition matrix P.

In one embodiment, the average spectrum matrix X_avgAnd an average spectral matrix X_avgRedThe following inequalities are satisfied:

(1-eps)||X_avg-X_avgRed||²<||X_avg-X_avgRed||²<(1+eps)||X_avg-X_avgRed||²；

the p wavelength points and the q wavelength points after dimensionality reduction satisfy the following inequality:

wherein eps is the dimension reduction error.

In one embodiment, the artificial neural network prediction model is a two-dimensional convolution prediction model;

the step S5 further includes:

step S51, introducing the spectrum matrix after dimensionality reduction into an input layer of two-dimensional convolution, calculating through two convolution layers, a weight and activation function and a pooling layer, and transmitting the spectrum matrix to an output layer after calculating through the convolution layer and the pooling layer for multiple times;

and step S52, comparing the predicted value obtained by the output layer with the sample expected value, and if an error exists between the predicted value and the sample expected value, returning to step S51 to adjust the weight until the difference between the predicted value and the sample expected value reaches a first threshold value.

In an embodiment, the step S6, further includes:

and (4) carrying out verification set inspection on the artificial neural network prediction model established in the step (S5) by adopting a verification set, and calculating a prediction standard deviation Rmsep, wherein the corresponding expression is as follows:

where m is the number of spectra in the validation set, y_i,actual1To verify the measurement of the ith spectrum，y_i,predicted1To verify the predicted value of the ith spectrum.

In an embodiment, the step S6, further includes:

performing cross check on the artificial neural network prediction model established in the step S5 by using a correction set, and calculating a cross validation standard deviation Rmescv, wherein a corresponding expression is as follows:

where n is the number of spectra in the calibration set, y_i,actual2To correct the measured value of the ith spectrum, y_i,predicted2Predicted values of the ith spectrum are set for correction.

In order to achieve the above object, the present invention provides a near infrared spectrum quantitative analysis dimension reduction system based on a stochastic projection algorithm, comprising:

a memory for storing instructions executable by the processor;

a processor for executing the instructions to implement the method of any one of the above

To achieve the above object, the present invention provides a computer readable medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the method as described in any one of the above.

According to the near infrared spectrum quantitative analysis dimensionality reduction method and system based on the stochastic projection algorithm, Gaussian stochastic projection is used for dimensionality reduction, spectral wavelength selection is not needed, modeling difficulty is reduced, modeling time is shortened, and simple and rapid modeling can be performed for near infrared analysis.

Drawings

The above and other features, properties and advantages of the present invention will become more apparent from the following description of the embodiments with reference to the accompanying drawings in which like reference numerals denote like features throughout the several views, wherein:

FIG. 1 discloses a flow chart of a method for reducing the dimension of near infrared spectrum quantitative analysis based on a stochastic projection algorithm according to an embodiment of the invention;

FIG. 2 discloses a sample raw spectrum according to an embodiment of the present invention;

FIG. 3 discloses a schematic diagram of a near infrared spectrum quantitative analysis dimension reduction system based on a stochastic projection algorithm according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the defects of the existing near-infrared technology, the invention provides a near-infrared spectrum quantitative analysis dimension reduction method and system based on a random projection algorithm, and the method and system can be widely applied to the industries of petrochemical industry, agriculture, food and the like.

Fig. 1 discloses a flow chart of a near infrared spectrum quantitative analysis dimension reduction method based on a stochastic projection algorithm according to an embodiment of the present invention, and as shown in fig. 1, the near infrared spectrum quantitative analysis dimension reduction method based on the stochastic projection algorithm provided by the present invention specifically includes the following steps:

Step S3, near infrared spectrum x of correction set_valPreprocessing the spectrum matrix X_valAverage spectrum x over correction set_avgPreprocessing to obtain an average spectrum matrix X_avg；

Step S5, based on the reduced dimension spectrum matrix X_valRedEstablishing an artificial neural networkMeasuring a model;

These steps will be described in detail below. It is understood that within the scope of the present invention, the above-mentioned technical features of the present invention and the technical features described in detail below (e.g., the embodiments) can be combined with each other and associated with each other to constitute a preferred technical solution.

Step S1, acquiring near infrared spectrum x_valSample and corresponding physicochemical property value y_valAs a sample set.

And acquiring the near infrared spectrum of a batch of samples and corresponding physicochemical property values thereof for modeling.

The physicochemical properties include physical properties and chemical properties.

Alternatively, physical properties include, but are not limited to, density, freezing point and viscosity, distillation range, and the like;

optionally, the chemical properties include composition of the components, elemental content, and the like.

Alternatively, the physicochemical property value may be measured in a laboratory manner as a measured value.

In this example, a batch of near infrared spectra x is acquired_valAnd physical and chemical property value y_valFor modeling;

wherein the near infrared spectrum x_valComprising n spectra x_valiWherein i is 1 to n, x_valiRepresents the ith spectrum;

ith spectrum x_valiPhysicochemical property value y corresponding to label attribute_vali；

The near infrared spectrum has p wavelength points.

Step S2, dividing the sample set into a correction set and a verification set, and according to the near infrared spectrum x of the correction set_valCalculating the average spectrum x_avg。

Further onN average spectra x of the near infrared spectra_avgThe calculation formula is shown as (1):

wherein n is the number of near infrared spectrums, x_valiIs the ith spectrum.

Furthermore, m spectra with strong spectral representation are selected from the sample set as a correction set by adopting a K-S algorithm based on Euclidean distance or an SPXY algorithm based on property variables, and the rest samples are used as a verification set.

The principle of the K-S (Kennard-Stone) algorithm is that all samples are regarded as training set candidate samples, and samples are sequentially selected from the training set candidate samples to be used as training set candidate samples. Firstly, selecting two samples with the farthest Euclidean distance into a training set, finding two samples with the farthest Euclidean distance and the nearest Euclidean distance from each remaining sample to each known sample in the training set, selecting the two samples into the training set, and repeating the steps until the number of the samples meets the requirement.

The SPXY (sample set partitioning on joint x-y distance) algorithm was developed based on the K-S algorithm, which takes into account both the x-variable and the y-variable in the calculation of the distance between samples.

Step S3, near infrared spectrum x of correction set_valPreprocessing the spectrum to obtain a spectrum matrix X_valAverage spectrum x for the calibration set_avgPreprocessing to obtain an average spectrum matrix X_avg。

The near infrared spectrum is susceptible to interference of some environmental factors during measurement, noise is generated, and the spectrum contains some wavelength points which cannot be utilized, so that the spectrum is preprocessed in the step.

The spectrum preprocessing can amplify effective information of the spectrum and filter noise information in the spectrum, so that the modeling complexity is reduced, and the robustness of the model is improved.

The manner of preprocessing includes, but is not limited to, first derivative, second derivative, and maximum-minimum normalization.

Further, the step S3 further includes:

In this embodiment, the three preprocessing methods are simultaneously adopted, so that one sample spectrum x is obtained_valAnd average spectrum x_avgThe spectrum matrix X containing three preprocessing modes is changed_valAnd the average spectral matrix X_avg。

And carrying out various pretreatments on a sample spectrum, and combining the obtained various preprocessed data into a matrix.

Step S4, based on the Gaussian random projection algorithm, the spectrum matrix X of the correction set_valCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix X_valRed。

And performing Gaussian random projection on each sample matrix to a low-dimensional matrix.

Further, the step S4 further includes:

step S42, based on Gaussian random projection transition matrix P, for near infrared spectrum x_valCarrying out random dimensionality reduction projection to obtain a dimensionality reduced spectrum matrix X_valRed。

More specifically, the gaussian random projection transition matrix P of step S41 is obtained by:

averaging the spectral matrix X of p wavelength points_avgCarrying out random dimensionality reduction projection to obtain a dimensionality reduced average spectrum matrix X of q wavelength points_avgRedMean spectral matrix X after dimensionality reduction_avgRedSatisfies the following formula (2):

X_avgRed＝P*X_avg (2)

wherein, P is a gaussian random projection transition matrix (obeying gaussian distribution) using the mean spectrum matrix to reduce dimension, and the transition matrix P can be solved according to the formula (2).

Wherein the average spectrum matrix X_avgAnd the average spectrum matrix X after dimensionality reduction_avgRedSatisfies the following inequality (3):

(1-eps)||X_avg-X_avgRed||²<||X_avg-X_avgRed||²<(1+eps)||X_avg-X_avgRed||² (3)

p wavelength points and q wavelength points after dimensionality reduction satisfy the following inequality (4):

eps is the dimensionality reduction error.

In the present embodiment, eps uses a default value of 0.1.

For each spectrum, a random dimensionality reduction is performed, which is calculated as represented by equation (5):

X_valRedi＝P*X_vali (5)

wherein: x_valRediIs the reduced spectrum matrix X_valRedThe ith element of (1), X_valiIs a spectral matrix X of the dimension to be reduced_valThe ith element of (1).

The dimensionality reduction from p wavelength points to q wavelength points can be realized through the formula (5), and the characteristics of the data are greatly maintained.

Step S5, based on the reduced dimension spectrum matrix X_valRedAnd establishing an artificial neural network prediction model.

And establishing an artificial neural network prediction model by using the correction set. The artificial neural network model includes, but is not limited to, a multilayer perceptron prediction model, a back propagation neural network prediction model, a convolutional neural network prediction model, and the like.

A Multilayer Perceptron (MLP) is a feedforward artificial neural network model that maps multiple input data sets onto a single output data set.

The convolutional neural network has strong capability of extracting features, and any nonlinear mapping from input to output can be realized by using the convolutional neural network, so that the problem that the nonlinear relation cannot be reflected by a partial least square method is solved.

In this embodiment, a two-dimensional convolutional neural network is used to build an analysis model, so as to obtain a predicted value of quantitative analysis.

Thus, the step S5 further includes:

step S51, reducing the dimension of the spectrum matrix X_valRediImporting an input layer of two-dimensional convolution, calculating by two layers of convolution layers, a weight and an activation function and a pooling layer, and transmitting to an output layer after calculating by a plurality of times of convolution layers and pooling layers;

and step S52, comparing the predicted value obtained by the output layer with the sample expected value, if an error exists between the predicted value and the sample expected value, returning to step S51 to adjust the weight, and continuously adjusting the weight until the difference between the predicted value and the sample expected value reaches a first threshold value.

The first threshold is a preset minimum value or a minimum value.

And step S6, verifying the artificial neural network prediction model established in the step S5 by adopting a verification set.

And (5) carrying out verification set inspection on the artificial neural network prediction model established in the step (S5) by adopting a verification set, and calculating a prediction standard deviation Rmsep according to a formula (6), wherein the corresponding expression is as follows:

where m is the number of spectra in the validation set, y_i,actual1To verify the measurement of the ith spectrum, y_i,predicted1To verify the predicted value of the ith spectrum.

Performing cross check on the artificial neural network prediction model established in the step S5 by using a correction set, and calculating a cross validation standard deviation Rmescv according to a formula (7), wherein a corresponding expression is as follows:

And (4) checking the model by using the spectral true measured value of the verification set, judging whether the prediction accuracy requirement is met, if not, returning to the step S1 to restart the modeling process, and if so, entering the step S7 to apply the model.

The model checked in step S6 meets the prediction accuracy requirement, and can be used for quantitative analysis of actual infrared spectrum. And (4) importing the near infrared spectrum data to be analyzed into the model, and outputting the corresponding physicochemical property predicted value.

While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood by one skilled in the art.

The near infrared spectrum quantitative analysis dimensionality reduction method based on the random projection algorithm reduces dimensionality by Gaussian random projection without wavelength selection, solves the problems that the prior art needs a large amount of manual experience intervention in the wavelength selection process and consumes a large amount of time, and is complex and simple.

Meanwhile, the difficulty that the traditional method cannot fit nonlinearity is overcome by means of the excellent nonlinearity fitting capability of the neural network model and the strong feature extraction capability of the two-dimensional convolution neural network.

Therefore, the near infrared spectrum quantitative analysis dimensionality reduction method based on the stochastic projection algorithm can greatly reduce modeling time and establish a rapid and accurate near infrared quantitative analysis model while ensuring modeling precision.

The aviation kerosene near infrared spectrum and data corresponding to a laboratory analysis report are used as experimental objects, and the near infrared spectrum quantitative analysis dimension reduction method based on the stochastic projection algorithm provided by the invention is specifically described in detail.

And step S1, acquiring a batch of near infrared spectrums and corresponding physicochemical property values thereof.

The specific process is as follows:

obtaining a spectrogram of 52 samples by a near-infrared spectrometer, wherein the number p of wavelength points of the spectrum is 2074, and the spectrum of the sample corresponds to the physicochemical property value reported by laboratory analysis, and the original spectrum of the sample is shown in fig. 2.

Step S2, selecting 36 spectra with strong spectral representativeness from the sample set as a correction set by adopting a K-S algorithm based on Euclidean distance, and calculating an average spectrum x_avg。

And step S3, performing three kinds of preprocessing on the near infrared spectrum simultaneously to obtain a spectrum matrix.

The specific process is as follows:

the near infrared spectrum is converted to a row matrix, while maximum and minimum preprocessing, first derivative and second derivative preprocessing, is used to form a 3-row spectral row matrix.

Step S4 for spectral matrix X_valCarrying out random Gaussian random projection to obtain each sample spectral matrix X after dimensionality reduction_valRed；

The specific process is as follows:

by spectral matrix X_valCalculating the average spectrum X_avgObtaining a Gaussian distribution transition matrix P through Gaussian random projection to obtain a dimensionality-reduced spectrum matrix X_avlRedI.e. a 3-row 941 wavelength point spectral row matrix after dimensionality reduction.

Step S5, establishing a two-dimensional convolution prediction model;

the specific process is as follows:

dimension reduction of inputSpectral matrix sample X_valRediAnd introducing an input layer of the two-dimensional convolution, fitting physicochemical properties (density) through the convolution, pooling and output layer, and generating a two-dimensional convolution prediction model called the model of the method.

In particular, as a comparison, the spectral matrix X without dimensionality reduction is used_valAnd introducing an input layer of the two-dimensional convolution, fitting the density through the convolution, pooling and output layers, and generating a two-dimensional convolution prediction model for comparison, wherein the two-dimensional convolution prediction model is called as an unreduced-dimension model.

Step S6, verifying the established model;

the specific process is as follows:

and respectively importing 16 verification samples of the verification set into the 'method model' and the 'non-dimensionality-reduction model', and respectively predicting the density of the 16 samples.

The measured values and predicted values of the verification sets of the "method model" and the "unreduced model" are shown in table 1, and the training time required for generating the model and the model evaluation results are shown in table 2.

Table 1 two model validation sets sample densitometry and predictive value results (in kg/m)³)

TABLE 2 comparison of model evaluation data of two models

Method	Training time(s))	Rmsep(kg/m³)	Rmsecv(kg/m³)
				Method model	65	2.90	2.38
Non-dimensionality reduction model	446	2.80	2.22

As can be seen from tables 1 and 2, the method model saves 85% of training time compared with the non-dimensionality-reduced model, and only increases the prediction deviation by no more than 10%.

FIG. 3 is a block diagram of a near infrared spectroscopy quantitative analysis dimensionality reduction system based on a stochastic projection algorithm according to an embodiment of the invention. The near infrared spectroscopy quantitative analysis dimension reduction system based on the stochastic projection algorithm may include an internal communication bus 301, a processor (processor)302, a Read Only Memory (ROM)303, a Random Access Memory (RAM)304, a communication port 305, and a hard disk 307. The internal communication bus 301 can realize data communication among the components of the near infrared spectrum quantitative analysis dimensionality reduction system based on a stochastic projection algorithm. Processor 302 may make the determination and issue a prompt. In some embodiments, processor 302 may be comprised of one or more processors.

The communication port 305 can realize data transmission and communication between the near infrared spectrum quantitative analysis dimensionality reduction system based on the stochastic projection algorithm and an external input/output device. In some embodiments, a stochastic projection algorithm based near infrared spectroscopy dimension reduction system may send and receive information and data from the network through the communication port 305. In some embodiments, the stochastic projection algorithm based near infrared spectroscopy quantitative analysis dimension reduction system may transmit and communicate data between the external input/output device and the input/output terminal 306 in a wired manner.

The stochastic projection algorithm based near infrared spectroscopy dimension reduction system may also include various forms of program storage units and data storage units, such as a hard disk 307, Read Only Memory (ROM)303 and Random Access Memory (RAM)304, capable of storing various data files for computer processing and/or communication use, and possibly program instructions for execution by the processor 302. The processor 302 executes these instructions to implement the main parts of the method. The results of the processing by the processor 302 are communicated to an external output device via the communication port 305 for display on a user interface of the output device.

For example, the implementation process file of the above-mentioned near infrared spectroscopy quantitative analysis dimension reduction method based on the stochastic projection algorithm may be a computer program, stored in the hard disk 307, and recorded in the processor 302 for execution, so as to implement the method of the present application.

When the implementation process file of the near infrared spectrum quantitative analysis dimensionality reduction method based on the stochastic projection algorithm is a computer program, the implementation process file can also be stored in a computer readable storage medium to be used as a product. For example, computer-readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD)), smart cards, and flash memory devices (e.g., electrically Erasable Programmable Read Only Memory (EPROM), card, stick, key drive). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.

Compared with the prior art, the invention provides a near infrared spectrum quantitative analysis dimensionality reduction method and system based on a random projection algorithm, and the method and system have the following beneficial effects:

1) complex wavelength selection is not needed, the reliability and the accuracy of the model are ensured, the requirements on technical personnel and the modeling complexity are reduced, and the improvement of a near-infrared quantitative modeling method are promoted;

2) the dimension reduction is realized by using a Gaussian random projection method, so that the loss of information is avoided and the spatial dimension is reduced while sufficient spectral information is extracted, thereby reducing the processing data volume of modeling;

3) meanwhile, a plurality of typical preprocessing methods are used, so that the spectral noise is effectively reduced, and a solid foundation is laid for subsequent modeling.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only the explicitly identified steps or elements as not constituting an exclusive list and that the method or apparatus may comprise further steps or elements.

Those of skill in the art would understand that information, signals, and data may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits (bits), symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The various illustrative logical modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disks) usually reproduce data magnetically, while discs (discs) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The embodiments described above are provided to enable persons skilled in the art to make or use the invention and that modifications or variations can be made to the embodiments described above by persons skilled in the art without departing from the inventive concept of the present invention, so that the scope of protection of the present invention is not limited by the embodiments described above but should be accorded the widest scope consistent with the innovative features set forth in the claims.

Claims

1. A near infrared spectrum quantitative analysis dimensionality reduction method based on a stochastic projection algorithm is characterized by comprising the following steps:

step S1, acquiring near infrared spectrum x_valAnd corresponding physicochemical property value y_valAs a sample set;

Step S3, near infrared spectrum x of correction set_valIs pretreated to obtainSpectral matrix X_valAverage spectrum x for the calibration set_avgPreprocessing to obtain an average spectrum matrix X_avg；

Step S5, based on the spectrum matrix X after dimension reduction_valRedEstablishing an artificial neural network prediction model;

2. The method for quantitative analysis and dimension reduction of near infrared spectrum based on stochastic projection algorithm as claimed in claim 1, wherein in step S2, the average spectrum x_avgThe corresponding expression is:

wherein n is the number of near infrared spectrums, x_valiIs the ith spectrum.

3. The method for quantitative analysis and dimension reduction of near infrared spectroscopy based on stochastic projection algorithm of claim 1, wherein the step S2 is to divide the sample set into a correction set and a verification set, further comprising:

4. The method for quantitative analysis and dimension reduction of near infrared spectrum based on stochastic projection algorithm according to claim 1, wherein the preprocessing in step S3 comprises: first derivative, second derivative and maximum-minimum normalization.

5. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S3 further comprises:

6. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S4 further comprises:

7. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 6, wherein the step S41 further comprises:

8. The stochastic projection algorithm-based near infrared spectroscopic quantitative analysis dimension reduction method of claim 7, wherein the average spectral matrix X is_avgAnd the average spectrum matrix X after dimensionality reduction_avgRedThe following inequalities are satisfied:

(1-eps)||X_avg-X_avgRed||²＜||X_avg-X_avgRed||²＜(1+eps)||X_avg-X_avgRed||²；

wherein eps is dimension reduction error.

9. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the artificial neural network prediction model of step S5 further comprises:

a multi-layer perceptron prediction model, a back propagation neural network prediction model, and a convolutional neural network prediction model.

10. The stochastic projection algorithm-based near infrared spectroscopy quantitative analysis dimension reduction method according to claim 1, wherein the artificial neural network prediction model is a two-dimensional convolution prediction model;

the step S5 further includes:

step S51, introducing the spectrum matrix after dimensionality reduction into an input layer of two-dimensional convolution, calculating through two layers of convolution layers, a weight and activation function and a pooling layer, and transmitting the spectrum matrix to an output layer after calculating through the convolution layers and the pooling layer for multiple times;

11. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S6 further comprises:

where m is the number of spectra in the validation set, y_i，actual1To verify the measurement of the ith spectrum, y_{i，predicted1}To verify the predicted value of the ith spectrum.

12. The method for quantitative analysis and dimension reduction in near infrared spectroscopy based on stochastic projection algorithm according to claim 1, wherein the step S6 further comprises:

where n is the number of spectra in the calibration set, y_i，actual2To correct the measured value of the ith spectrum, y_{i，predicted2}Predicted values of the ith spectrum are set for correction.

13. A near infrared spectrum quantitative analysis dimensionality reduction system based on a stochastic projection algorithm comprises:

a memory for storing instructions executable by the processor;

a processor for executing the instructions to implement the method of any one of claims 1-12.

14. A computer readable medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, perform the method of any of claims 1-12.