CN115270611A - Model building method and device for nonlinear index of sample in petrochemical industry - Google Patents

Model building method and device for nonlinear index of sample in petrochemical industry Download PDF

Info

Publication number
CN115270611A
CN115270611A CN202210824506.9A CN202210824506A CN115270611A CN 115270611 A CN115270611 A CN 115270611A CN 202210824506 A CN202210824506 A CN 202210824506A CN 115270611 A CN115270611 A CN 115270611A
Authority
CN
China
Prior art keywords
model
data
kpls
processed
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210824506.9A
Other languages
Chinese (zh)
Inventor
刘阳
詹辉
何恺源
邓晓旭
颜廷江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Xinfu Technology Co ltd
Original Assignee
Guangdong Xinfu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Xinfu Technology Co ltd filed Critical Guangdong Xinfu Technology Co ltd
Priority to CN202210824506.9A priority Critical patent/CN115270611A/en
Publication of CN115270611A publication Critical patent/CN115270611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Geometry (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

The invention discloses a method and a device for establishing a model of a sample nonlinear index in petrochemical industry, wherein the method comprises the steps of obtaining a plurality of sample data to be processed of the index to be established, classifying the sample data to be processed, processing a spectrogram and processing data, then screening out a data set for constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model by using a KPLS modeling method, wherein the GA-KPLS model is obtained by optimizing the KPLS model by a GA algorithm, then screening various models, selecting the most appropriate model aiming at the sample nonlinear index as a spectrogram prediction model, and improving the detection accuracy.

Description

Model building method and device for nonlinear index of sample in petrochemical industry
Technical Field
The invention relates to the field of petrochemical industry, in particular to a method and a device for establishing a model of a sample nonlinear index in petrochemical industry.
Background
Under the background of continuous and high-speed development of social economy in China, new requirements are continuously put forward in different fields in order to adapt to the development of the society. In the field of petrochemical industry, the current judgment standard is not only reflected in the demand of energy, but also requires oil products with higher quality. For product detection in the petrochemical field, most of the traditional product quality detection technologies adopt laboratory analysts to directly perform sampling analysis, so that the analysis timeliness is poor, the analysis cost is high, the requirements provided in the modern detection field are difficult to meet, and the corresponding daily production requirements of petrochemical enterprises are met.
In the whole oil product generation process, including the crude oil sample entering a factory, each set of process device and corresponding process pipelines thereof, an oil product storage and transportation unit and the like, if the oil product can be rapidly analyzed and detected in a whole period, the cost of petrochemical enterprises can be saved, and meanwhile, the daily production of the petrochemical enterprises can be more efficiently guided. In the petrochemical production process, the policy advocated in China is energy conservation and environmental protection, the analysis and detection of raw oil, middle distillate oil and final product oil become increasingly important, and in order to meet the actual requirements in the production process in the petrochemical field, a large number of rapid analysis and detection instruments including various infrared analyzers, raman analyzers, chromatographic analyzers, nuclear magnetic resonance analyzers and the like are developed at present.
Most of quantitative analysis of petrochemical samples in chemometrics adopts Partial Least Squares (PLS) in multivariate calibration algorithms, which is the mainstream algorithm in chemometrics at present, but the defects are obvious, and PLS is a linear method and is difficult to establish an accurate model for nonlinear indexes such as viscosity. On the other hand, some developers try to establish a nonlinear index model by introducing an Artificial Neural Network (ANN), and although the method solves the problem of inaccurate measurement of nonlinear index modeling to a certain extent, the ANN modeling process often requires a large amount of data to train the network, and meanwhile, the problem of overfitting of the ANN model is serious, and the deviation of the samples except for the training data fluctuates violently.
Disclosure of Invention
The invention provides a model building method and a device for nonlinear indexes of samples in petrochemical industry, and aims to solve the technical problem that the existing nonlinear index model in petrochemical materials is low in accuracy.
In order to solve the technical problem, the invention provides a model establishing method for a nonlinear index of a sample in petrochemical industry, which comprises the following steps:
acquiring data of a plurality of samples to be processed of indexes to be modeled; the to-be-processed sample data comprises a plurality of spectral data graphs and assay analysis data corresponding to each spectral data graph;
processing the sample data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets; wherein the data sets comprise a correction set and a verification set, and each data set comprises a plurality of processed sample data to be processed;
according to a preset KPLS modeling method, screening a first data set with the minimum average deviation value from the plurality of correction sets;
respectively constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to the first data set and the preset initial model; wherein the initial models comprise a PLS initial model, an ANN initial model and a KPLS initial model; the GA-KPLS model is obtained by optimizing a constructed KPLS model through a GA algorithm;
and according to the verification set, respectively carrying out prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
The invention adopts a model establishing method for a sample nonlinear index in petrochemical industry, which comprises the steps of firstly obtaining a plurality of sample data to be processed of an index to be modeled, including a plurality of spectral data graphs and assay analysis data corresponding to each spectral data graph, improving the detection accuracy by modeling a plurality of sample data to be processed, then processing the data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets, wherein the plurality of data sets comprise a correction set and a verification set, the preset spectrogram processing methods are numerous and more abundant in variety, a proper spectrogram processing method can be found more, and the detection accuracy is improved, and then finding out a first data set with the minimum average deviation value screened from the correction sets through the preset KPLS method construction model, finding out a spectrogram processing method with the best processing effect through the step, reducing data to be processed for the next model construction, then constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model, wherein the GA-KPLS model is obtained by optimizing the constructed KPLS model through a GA algorithm, optimizing parameters of the KPLS model through the GA algorithm, further enhancing the accuracy of the model, finally inputting the verification set into the PLS model, the ANN model, the KPLS model and the GA-KPLS model for prediction verification, and screening out the model with the minimum average deviation value as a spectrogram prediction model of a sample to be modeled. Compared with the prior art that a model construction method is fixedly used for a certain nonlinear index, the method screens the map prediction model which accords with the index to be constructed from a plurality of constructed models, and can adapt to the self characteristic of the nonlinear index, so that the detection accuracy of the index to be constructed is improved.
As a preferred example, the acquiring of the data of the plurality of samples to be processed of the index to be modeled specifically includes:
determining the type of an analyzer according to the type of the samples to be collected, and scanning a plurality of samples to be collected through the analyzer with the determined type to obtain a spectral data graph corresponding to each sample to be collected;
each sample to be collected corresponds to the assay analysis data of the index to be modeled, and the types of the samples to be collected are the same.
According to the invention, through analyzing the types of the samples to be collected and selecting a proper analyzer for scanning, not only can the spectrum data of the samples to be collected be rapidly obtained, but also the proper analyzer is selected according to the characteristics of the samples to be collected, so that the spectrum data of the samples to be collected can be more accurately obtained, meanwhile, the samples to be collected are analyzed by adopting a standard method and a traditional analyzer to obtain the assay analysis data set of the samples to be collected, and the standard can be provided for the subsequent evaluation of the model.
As a preferred example, the processing the data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets specifically includes:
and classifying a plurality of data to be processed according to a Kennard-Stone algorithm into a correction set to be processed and a verification set to be processed.
The Kennard-Stone algorithm is utilized to divide the obtained data to be processed into a correction set and a verification set, on one hand, the correction set is convenient for subsequent sample searching, time cost is saved, on the other hand, the correction set can be used for subsequent model construction, and the verification set is used for model evaluation.
As a preferred example, the processing the data to be processed through a plurality of preset spectrogram processing methods to obtain a plurality of data sets specifically includes:
respectively and independently processing the to-be-processed correction set and the to-be-processed verification set by N different spectrogram preprocessing methods to obtain N correction sets and N verification sets so as to obtain N data sets, wherein N is a positive integer greater than or equal to 2;
the spectrogram preprocessing method comprises the following steps: the data were not processed, mean centered, mean variance, vector normalization, standard normal variable transformation, multivariate scatter correction, savitzky-Golay convolution smoothing, first derivative method.
According to the invention, the correction set to be processed and the verification set to be processed are independently processed by a plurality of preset spectrogram preprocessing methods, and a proper spectrogram processing method can be selected according to the model effect of each processing method in the modeling process, so that the accuracy of nonlinear index detection is enhanced.
As a preferred example, in the method for modeling according to a preset KPLS, a first data set with a minimum average deviation value is selected from the plurality of calibration sets, and the method specifically includes:
and respectively modeling the plurality of correction sets according to a preset KPLS modeling method to obtain a plurality of first models, calculating the minimum average deviation value corresponding to each first model, and selecting the correction set with the minimum average deviation value as a first data set.
The method uses the KPLS algorithm to construct the model, does not need to use too many parameters, reduces the algorithm operation time, improves the efficiency, meanwhile, selects the correction set with the minimum average deviation value as the first data set by comparing the minimum average deviation values obtained after the plurality of correction sets are input into the KPLS model, namely screens out the spectrogram preprocessing method with the best processing effect, and improves the detection accuracy.
As a preferred example, before the selecting, according to a preset KPLS modeling method, a first data set with a minimum average deviation value from the plurality of correction sets, the method includes:
performing correlation analysis on the data matrixes and the physical property indexes in the data sets to obtain correlation coefficients between data with different dimensions and physical properties, sequencing the data according to the correlation coefficients from large to small, selecting different data points in a fixed step length, building a simple model of the data points and the physical property indexes according to the different data points, and selecting the optimal number of the data points and a specific area according to the effect of the simple model;
and updating each data set according to the optimal data point number and the specific area.
The invention carries out dimension reduction processing on the data in the data sets and selects the optimal data point number and data area, thereby reducing the interference of useless information on one hand, reducing the data needing to be processed on the other hand and improving the modeling efficiency.
As a preferred example, in the building of the PLS model, the ANN model, the KPLS model, and the GA-KPLS model according to the first data set and the preset initial model, respectively, specifically, the building of the PLS model, the ANN model, the KPLS model, and the GA-KPLS model includes:
constructing a PLS model from the first data set and the PLS initial model;
constructing an ANN model according to the first data set and the ANN initial model;
constructing a KPLS model according to the first data set and the KPLS initial model;
according to the constructed KPLS model, optimizing parameters of a kernel function introduced into the KPLS model by adopting a GA algorithm to obtain a GA-KPLS model;
and respectively constructing a PF-KPLS model, a RBF-KPLS model and an MIX-KPLS model according to the constructed KPLS model and by combining a preset kernel function.
According to the method, firstly, a PLS model, an ANN model, a KPLS model and a GA-KPLS model are constructed according to a preset initial model and the first data set, a plurality of models are constructed, the problem that the model is single when nonlinear indexes are modeled in the prior art is solved, meanwhile, the GA-KPLS model optimizes relevant parameters of a kernel function introduced into the KPLS model by adopting a GA algorithm, and then the optimal operation parameters of the model are found, so that the accuracy of the model can be further improved.
As a preferred example, the performing, according to the verification set, prediction verification on the PLS model, the ANN model, the KPLS model, and the GA-KPLS model, respectively, and determining, according to a verification result, a map prediction model of an index to be modeled from the verified models specifically includes:
respectively inputting the spectral data graphs in the verification set into each constructed second model, and obtaining predicted values of each second model under different spectrograms; wherein the second model comprises: a PLS model, an ANN model, a KPLS model, a GA-KPLS model, a PF-KPLS model, a RBF-KPLS model and a MIX-KPLS model;
comparing each predicted value with the assay analysis data in the verification set, and calculating the minimum average deviation value of each second model;
and selecting the second model with the minimum average deviation value as the map prediction model of the index to be modeled.
According to the method, the spectral data graph in the verification set is input into each constructed second model, the predicted values of each second model under different spectrograms are obtained, then each predicted value is compared with the test analysis data in the verification set, and the model with the minimum average deviation value is screened out to serve as the spectrogram prediction model of the index to be modeled, so that the self characteristics of the nonlinear index can be adapted, and the detection accuracy of the index to be constructed is improved.
On the other hand, the embodiment of the invention provides a model building device for nonlinear indexes of samples in petrochemical industry, which comprises the following components: the system comprises a data acquisition module, a data processing module, a screening module, a model generation module and a model verification module;
the data acquisition module is used for selecting different analyzers according to the types of the samples to acquire a plurality of to-be-processed sample data of the to-be-modeled indexes; the to-be-processed sample data comprises a plurality of spectral data graphs and assay analysis data corresponding to each spectral data graph;
the data processing module is used for processing the sample data to be processed according to a plurality of preset spectrogram processing methods to obtain a plurality of data sets; wherein the data sets comprise a correction set and a verification set, and each data set comprises a plurality of processed sample data to be processed;
the screening module is used for screening out a first data set with the minimum average deviation value from the correction sets according to a preset KPLS modeling method;
the model generation module is used for generating a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to a preset initial model and the first data set; wherein the initial models comprise a PLS initial model, an ANN initial model and a KPLS initial model; the GA-KPLS model is obtained by optimizing a constructed KPLS model through a GA algorithm;
and the model verification module is used for respectively performing prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model according to the verification set, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
The method comprises the steps of firstly, analyzing the type of a sample to be modeled through a data acquisition module, selecting a proper analyzer to scan the sample to be modeled, obtaining data to be processed, improving detection accuracy, then, processing the data to be processed through a plurality of spectrogram processing methods preset in a data processing module to obtain a plurality of data sets, obtaining more spectrogram processing methods and more abundant types, finding out proper spectrogram processing methods, improving detection accuracy, then, finding out the spectrogram processing method with the best processing effect through a model constructed through a KPLS method in a screening module, reducing data required to be processed for next modeling, improving modeling efficiency, then, constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model in a model generation module, wherein the GA-KPLS model is obtained by optimizing the constructed KPLS model through a GA algorithm, optimizing GA algorithm parameters, further enhancing model accuracy, finally, inputting the verification set into the PLS model, the ANN model, the KPLS model and the KPLS model through a model verification module, screening the verification model, selecting a minimum GA-LS model parameter which meets the requirements of the linear prediction indexes of the established non-linear prediction indexes of the PLS model, and constructing a minimum GA-KPLS model, and further enhancing the accuracy of the established non-linear prediction indexes of the PLS model, and constructing the minimum non-linear prediction indexes.
As a preferred example, the data processing module includes: a spectrogram processing unit and a classification unit;
the spectrogram processing unit is used for processing the acquired data of the sample to be processed according to a plurality of preset spectrogram processing methods to obtain a plurality of data sets;
the classification unit is used for classifying the data according to a Kennard-Stone algorithm and dividing the data into a to-be-processed correction set and a to-be-processed verification set.
According to the invention, the acquired data of the sample to be processed is processed by the spectrogram processing unit according to a plurality of preset spectrogram processing methods, so that the spectrogram processing method with better processing effect can be more accurately selected, and meanwhile, the classification unit classifies the data into the correction set to be processed and the verification set to be processed by adopting a Kennard-Stone algorithm, so that the subsequent model generation and model evaluation are facilitated.
Drawings
FIG. 1 is a schematic flow chart of a method for modeling a nonlinear index of a sample in petrochemical engineering according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another method for modeling a nonlinear index of a sample in petrochemical engineering according to the second embodiment of the present invention;
FIG. 3 is a viscosity number distribution scatter plot of a crude oil sample provided by one embodiment of the present invention;
FIG. 4 is a graph showing a comparison of the average deviation values of the crude oil viscosity according to the different model methods provided in the first embodiment of the present invention;
FIG. 5 is a vapor pressure value distribution scattergram of a gasoline sample provided in the second embodiment of the present invention;
FIG. 6 is a comparison graph of the average deviation of gasoline vapor pressure according to different model methods provided in the second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Example one
Referring to fig. 1, a flow chart of a method for modeling a nonlinear index of a sample in petrochemical industry according to an embodiment of the present invention mainly includes steps 101 to 106, and specifically includes the following steps:
step 101: and acquiring a plurality of to-be-processed sample data of the to-be-modeled indexes.
In this embodiment, step 101 specifically includes: determining the type of an analyzer according to the type of the samples to be collected, and scanning a plurality of samples to be collected through the analyzer with the determined type to obtain a spectral data graph corresponding to each sample to be collected; each sample to be collected corresponds to the assay analysis data of the index to be modeled, and the types of the samples to be collected are the same.
In this embodiment, before modeling, the multiple samples to be processed selected for use are crude oils with non-linear index viscosities, and the data of the samples to be processed to be obtained include a spectral data map and an assay analysis data set of the crude oil samples. First, because the sample selected in this embodiment is crude oil, the type of analyzer used is a mid-infrared analyzer, the crude oil sample is scanned to obtain a spectrum data map of the crude oil sample, and then an assay analysis data set corresponding to the crude oil sample is obtained by using a standard method and a conventional analyzer, the viscosity number distribution of the crude oil sample is shown in fig. 3, and fig. 3 is a viscosity number distribution scatter diagram of the crude oil sample.
Step 102: and processing the sample data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets.
In this embodiment, step 102 specifically includes: classifying a plurality of data to be processed according to a Kennard-Stone algorithm, and dividing the data to be processed into a correction set to be processed and a verification set to be processed; respectively and independently processing the to-be-processed correction set and the to-be-processed verification set by N different spectrogram preprocessing methods to obtain N correction sets and N verification sets so as to obtain N data sets, wherein N is a positive integer greater than or equal to 2; the spectrogram preprocessing method comprises the following steps: the data were not processed, mean centered, mean variance, vector normalization, standard normal variable transformation, multivariate scatter correction, savitzky-Golay convolution smoothing, first derivative method.
In this embodiment, first, according to a Kolmogorov-Smirnov algorithm, a plurality of data to be processed are classified, that is, spectrum data in a spectrum data map of the crude oil sample is obtained, and is divided into a correction set and a verification set, and then the correction set and the verification set are independently processed according to N different spectrogram preprocessing methods, respectively, so as to obtain N correction sets and N verification sets, that is, a plurality of preset spectrogram processing methods including non-processing, mean centralization, mean variance, vector normalization, standard normal variable transformation, multivariate scattering correction, savitzky-Golay convolution smoothing, and a first derivative method are used to obtain a processed correction set and verification set, where the processed correction set spectrogram is used to construct a model, and the processed verification set spectrogram is used to evaluate the model, so that the present embodiment can conveniently and quickly find the serial number of each sample, and accelerate the detection speed.
Step 103: and selecting intervals of the data sets.
In this embodiment, step 103 specifically includes: performing correlation analysis on the data matrixes and the physical property indexes in the data sets to obtain correlation coefficients between data with different dimensions and physical properties, sequencing the data according to the correlation coefficients from large to small, selecting different data points in a fixed step length, building a simple model of the data points and the physical property indexes according to the different data points, and selecting the optimal number of the data points and a specific area according to the effect of the simple model; and updating each data set according to the optimal data point number and the specific area.
In this embodiment, after interval selection is performed on data in the correction set and the verification set which are subjected to spectrogram processing, and correlation between the data in the correction set and the data in the verification set and nonlinear index viscosity is analyzed, the number of selected data points is adjusted to 2525 from 20 according to a step length of 5, then a PLS model is built according to each selection, and then the correction set and the verification set are updated according to the effect of the built PLS model, so that data which need to be processed in later model building is reduced.
Step 104: and screening out a first data set with the minimum average deviation value from the plurality of correction sets according to a preset KPLS modeling method.
In this embodiment, step 104 specifically includes: and respectively modeling the plurality of correction sets according to a preset KPLS modeling method to obtain a plurality of first models, calculating the minimum average deviation value corresponding to each first model, and selecting the correction set with the minimum average deviation value as a first data set.
In this embodiment, a KPLS algorithm is used to construct a model according to a calibration set obtained after classification by a Kolmogorov-Smirnov algorithm, to obtain a plurality of first models, then a minimum average deviation value obtained after inputting a verification set obtained by different spectrogram processing methods into the model is obtained, and a data set with the minimum average deviation value is selected as a data set for next modeling.
In this embodiment, the minimum average deviation tables obtained by different spectrogram processing methods are as follows:
Figure BDA0003745906960000101
Figure BDA0003745906960000111
in this embodiment, as can be seen from data in the table, when the 2 nd preprocessing scheme (Savitzky-Golay convolution smoothing) is selected to process a spectrogram, the minimum mean deviation value is minimum, which indicates that after a spectrum obtained after the crude oil sample is scanned by the mid-infrared analyzer is processed by the Savitzky-Golay convolution smoothing method, the influence of factors such as noise on the subsequent modeling process using various KPLS methods can be reduced, and the model accuracy is improved. Therefore, the data set processed by the Savitzky-Golay convolution smoothing method is selected as a first data set and used as a data set for next modeling.
Step 105: and respectively constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to the first data set and the preset initial model.
In this embodiment, step 105 specifically includes: and constructing a PLS model, an ANN model and a KPLS model by combining the first data set through a preset PLS initial model, an ANN initial model and a KPLS initial model, and optimizing parameters of a kernel function introduced into the KPLS model by adopting a GA algorithm according to the constructed KPLS model to obtain the GA-KPLS model.
As an example of this embodiment, after the model is built, a PF-KPLS model, an RBF-KPLS model, and an MIX-KPLS model may be respectively built according to the built KPLS model and by combining a preset kernel function. According to the method, through different kernel functions, KPLS models of more types are constructed, and the problem that the KPLS model is single when nonlinear index modeling is carried out is solved.
Step 106: and according to the verification set, respectively carrying out prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
In this embodiment, step 106 specifically includes: respectively inputting the spectral data graphs in the verification set into each constructed second model, and obtaining predicted values of each second model under different spectrograms; wherein the second model comprises: a PLS model, an ANN model, a KPLS model, a GA-KPLS model, a PF-KPLS model, a RBF-KPLS model, and a MIX-KPLS model; comparing each predicted value with the assay analysis data in the verification set, and calculating the minimum average deviation value of each second model; and selecting the second model with the minimum average deviation value as the map prediction model of the index to be modeled.
In this embodiment, after the second models, namely the PLS model, the ANN model, the KPLS model, the GA-KPLS model, the PF-KPLS model, the RBF-KPLS model, and the MIX-KPLS model, are constructed, the verification set spectrograms that have undergone the same process are input to the various constructed models, predicted values obtained after the spectrograms are input to the models are respectively obtained, and compared with the data in the assay analysis dataset, so as to obtain the minimum mean deviation of each model. The minimum mean deviation for each model is shown in the table below, and FIG. 4 is a comparison of the mean deviations for different models of crude oil viscosity.
Scheme number Modeling method Minimum mean deviation (mm)2/s)
Scheme 1 PLS 55.721
Scheme 2 ANN 47.251
Scheme 3 PF-KPLS 33.165
Scheme 4 RBF-KPLS 23.574
Scheme 5 MIX-KPLS 22.962
Scheme 6 GA-KPLS 20.356
In this embodiment, as can be seen from the above table and fig. 4, when the verification set is input into the constructed GA-KPLS model, the obtained minimum mean deviation value is minimum, and therefore it can be seen that the GA-KPLS model has the best effect, and therefore the GA-KPLS model is selected as the atlas prediction model of the nonlinear index viscosity of the crude oil sample selected in this example.
In the embodiment, the selected crude oil sample is scanned by using a mid-infrared analyzer, then different spectrogram preprocessing methods are adopted, a processing method most suitable for the viscosity of the nonlinear index of the crude oil sample is selected according to the obtained minimum average deviation, a Savitzky-Golay convolution smoothing method is finally selected, the detection accuracy can be improved, in addition, various models are constructed, and the model with the minimum average deviation value is finally screened out, so that the technical problem that the model is single in the model modeling of the nonlinear index of the sample in the prior art is solved.
Referring to fig. 2, another flow chart of a method for modeling a nonlinear index of a sample in petrochemical engineering according to the second embodiment of the present invention mainly includes steps 201 to 206, which are as follows:
step 201: scanning the collected sample according to a near infrared analyzer to obtain a spectral data graph of the sample, and collecting an assay analysis data set of the original sample.
In this embodiment, step 201 specifically includes: determining the type of an analyzer according to the type of the samples to be collected, and scanning a plurality of samples to be collected through the analyzer with the determined type to obtain a spectral data graph corresponding to each sample to be collected; each sample to be collected corresponds to the assay analysis data of the index to be modeled, and the types of the samples to be collected are the same.
In this embodiment, before modeling, the multiple samples to be processed selected are gasoline with a non-linear index vapor pressure, and the data of the samples to be processed, which needs to be acquired, includes a spectral data map and an assay analysis data set of the gasoline samples. First, because the sample selected in this embodiment is gasoline, a near infrared analyzer is used to scan the gasoline sample to obtain a spectrum data map of the gasoline sample, and then a standard method and a conventional analyzer are used to obtain an assay analysis data set corresponding to the crude oil sample, the vapor pressure value distribution of the gasoline sample is as shown in fig. 5, and fig. 5 is a vapor pressure value distribution scatter diagram of the gasoline sample.
Step 202: and processing the sample data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets.
In this embodiment, step 202 specifically includes: classifying a plurality of data to be processed according to a Kennard-Stone algorithm, and dividing the data to be processed into a correction set to be processed and a verification set to be processed; respectively and independently processing the to-be-processed correction set and the to-be-processed verification set by N different spectrogram preprocessing methods to obtain N correction sets and N verification sets so as to obtain N data sets, wherein N is a positive integer greater than or equal to 2; the spectrogram preprocessing method comprises the following steps: data are not processed, mean centering, mean variance, vector normalization, standard normal variable transformation, multivariate scattering correction, savitzky-Golay convolution smoothing, first derivative method.
In this embodiment, first, according to a Kolmogorov-Smirnov algorithm, a plurality of data to be processed are classified, that is, spectrum data in a spectrum data map of the obtained gasoline sample is classified into a correction set and a verification set, and then, according to N different spectrogram preprocessing methods, the correction set and the verification set are independently processed, respectively, to obtain N correction sets and N verification sets, that is, a plurality of preset spectrogram processing methods including non-processing, mean centralization, mean variance, vector normalization, standard normal variable transformation, multivariate scattering correction, savitzky-Golay convolution smoothing, and a first derivative method are used to obtain a processed correction set and verification set, where the processed correction set spectrogram is used to construct a model, and the processed verification set spectrogram is used to evaluate the model, which can facilitate the present embodiment to quickly find the serial number of each sample, and accelerate the detection speed.
Step 203: and carrying out interval selection on the plurality of data sets.
In this embodiment, step 203 specifically includes: performing correlation analysis on the data matrixes and the physical property indexes in the data sets to obtain correlation coefficients between data with different dimensions and physical properties, sequencing the data according to the correlation coefficients from large to small, selecting different data points in a fixed step length, building a simple model of the data points and the physical property indexes according to the different data points, and selecting the optimal number of the data points and a specific area according to the effect of the simple model; and updating each data set according to the optimal data point number and the specific area.
In this embodiment, after interval selection is performed on data in the correction set and the verification set which are subjected to spectrogram processing, and correlation between the data in the correction set and the data in the verification set and nonlinear index viscosity is analyzed, the number of selected data points is adjusted to 2525 from 20 according to a step length of 5, then a PLS model is built according to each selection, and then the correction set and the verification set are updated according to the effect of the built PLS model, so that data which need to be processed in later model building is reduced.
Step 204: and screening out a first data set with the minimum average deviation value from the plurality of correction sets according to a preset KPLS modeling method.
In this embodiment, step 204 specifically includes: and respectively modeling the plurality of correction sets according to a preset KPLS (kernel principal component modeling) modeling method to obtain a plurality of first models, calculating the minimum average deviation value corresponding to each first model, and selecting the correction set with the minimum average deviation value as a first data set.
In this embodiment, a model is constructed by using a KPLS algorithm according to a correction set obtained after classification by a Kolmogorov-Smirnov algorithm, so as to obtain a plurality of first models, then a minimum average deviation value obtained after inputting a verification set obtained by different spectrogram processing methods into the model is obtained, and a data set with the minimum average deviation value is selected as a data set for modeling in the next step.
In this embodiment, the minimum mean deviation table obtained by different spectrogram processing methods is as follows:
Figure BDA0003745906960000151
in this embodiment, as can be seen from the data in the table, when the 8 th preprocessing scheme (multivariate scattering correction) is selected to process the spectrogram, the minimum average deviation value is the smallest, which indicates that after the spectrum obtained by scanning the gasoline sample by the near infrared analyzer is processed by the multivariate scattering correction method, the influence of factors such as noise on the subsequent modeling process by using various KPLS methods can be reduced, and the model accuracy can be improved. And thus selecting the data set processed by the multivariate scatter correction method as a first data set to be used as a data set for modeling in the next step.
Step 205: and respectively constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to the first data set and the preset initial model.
In this embodiment, step 205 specifically includes: and constructing a PLS model, an ANN model and a KPLS model by combining the first data set through a preset PLS initial model, an ANN initial model and a KPLS initial model, and optimizing parameters of a kernel function introduced into the KPLS model by adopting a GA algorithm according to the constructed KPLS model to obtain the GA-KPLS model.
As an example of this embodiment, after the model is built, a PF-KPLS model, an RBF-KPLS model, and an MIX-KPLS model may be respectively built according to the built KPLS model and by combining a preset kernel function. According to the method, through different kernel functions, KPLS models of more types are constructed, and the problem that the KPLS model is single when nonlinear index modeling is carried out is solved.
Step 206: and according to the verification set, respectively carrying out prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
In this embodiment, step 206 specifically includes: respectively inputting the spectral data graphs in the verification set into each constructed second model, and obtaining predicted values of each second model under different spectrograms; wherein the second model comprises: a PLS model, an ANN model, a KPLS model, a GA-KPLS model, a PF-KPLS model, a RBF-KPLS model and a MIX-KPLS model; comparing each predicted value with the assay analysis data in the verification set, and calculating the minimum average deviation value of each second model; and selecting the second model with the minimum average deviation value as the map prediction model of the index to be modeled.
In this embodiment, after the second models, namely the PLS model, the ANN model, the KPLS model, the GA-KPLS model, the PF-KPLS model, the RBF-KPLS model, and the MIX-KPLS model, are constructed, the verification set spectrograms that have undergone the same process are input to the various constructed models, predicted values obtained after the spectrograms are input to the models are respectively obtained, and compared with the data in the assay analysis dataset, so as to obtain the minimum mean deviation of each model. The minimum mean deviation of each model is shown in the following table, and fig. 6 is a comparison graph of the mean deviation of gasoline vapor pressure under different model methods.
Figure BDA0003745906960000161
Figure BDA0003745906960000171
In this embodiment, as can be seen from the above table and fig. 4, when the verification set is input into the constructed GA-KPLS model, the obtained minimum mean deviation value is minimum, and therefore it can be seen that the GA-KPLS model has the best effect, and therefore the GA-KPLS model is selected as the map prediction model of the non-linear index vapor pressure of the gasoline sample selected in this embodiment.
In the embodiment, the selected gasoline sample is scanned by using a near infrared analyzer, then different spectrogram preprocessing methods are adopted, a processing method most suitable for the steam pressure of the nonlinear index of the gasoline sample is selected according to the obtained minimum average deviation, a multivariate scattering correction method is finally selected, the detection accuracy can be improved, in addition, various models are constructed, and the model with the minimum average deviation value is finally screened out, so that the technical problem that the model is single in the model modeling of the nonlinear index of the sample in the prior art is solved.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A model building method for nonlinear indexes of samples in petrochemical industry is characterized by comprising the following steps:
acquiring data of a plurality of samples to be processed of indexes to be modeled; the to-be-processed sample data comprises a plurality of spectral data graphs and assay analysis data corresponding to each spectral data graph;
processing the sample data to be processed by a plurality of preset spectrogram processing methods to obtain a plurality of data sets; wherein the data sets comprise a correction set and a verification set, and each data set comprises a plurality of processed sample data to be processed;
screening a first data set with the minimum average deviation value from the correction sets according to a preset KPLS modeling method;
respectively constructing a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to the first data set and the preset initial model; wherein the initial models comprise a PLS initial model, an ANN initial model and a KPLS initial model; the GA-KPLS model is obtained by optimizing a constructed KPLS model through a GA algorithm;
and according to the verification set, respectively carrying out prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
2. The method for establishing a model of a nonlinear index of a sample in petrochemical industry according to claim 1, wherein the obtaining of a plurality of to-be-processed sample data of the index to be modeled specifically comprises:
determining the type of an analyzer according to the type of the samples to be collected, and scanning a plurality of samples to be collected through the analyzer with the determined type to obtain a spectral data graph corresponding to each sample to be collected;
each sample to be collected corresponds to the assay analysis data of the index to be modeled, and the types of the samples to be collected are the same.
3. The method for establishing a model of a nonlinear index of a sample in petrochemical industry according to claim 1, wherein the processing of the data to be processed through a plurality of preset spectrogram processing methods to obtain a plurality of data sets specifically comprises:
and classifying a plurality of data to be processed according to a Kennard-Stone algorithm into a correction set to be processed and a verification set to be processed.
4. The method for modeling nonlinear indicators in petrochemical industry according to claim 3, wherein the processing the data to be processed through a plurality of preset spectrogram processing methods to obtain a plurality of data sets specifically comprises:
respectively and independently processing the to-be-processed correction set and the to-be-processed verification set by N different spectrogram preprocessing methods to obtain N correction sets and N verification sets so as to obtain N data sets, wherein N is a positive integer greater than or equal to 2;
the spectrogram preprocessing method comprises the following steps: the data were not processed, mean centered, mean variance, vector normalization, standard normal variable transformation, multivariate scatter correction, savitzky-Golay convolution smoothing, first derivative method.
5. The method as claimed in claim 1, wherein the step of selecting the first data set having the smallest average deviation value from the plurality of calibration sets according to a predetermined KPLS modeling method comprises:
and respectively modeling the plurality of correction sets according to a preset KPLS modeling method to obtain a plurality of first models, calculating the minimum average deviation value corresponding to each first model, and selecting the correction set with the minimum average deviation value as a first data set.
6. The method as claimed in claim 5, wherein before said selecting the first data set having the smallest average deviation value from the plurality of calibration sets according to the predetermined KPLS modeling method, the method comprises:
performing correlation analysis on the data matrixes and the physical property indexes in the data sets to obtain correlation coefficients between data with different dimensions and physical properties, sequencing the data according to the correlation coefficients from large to small, selecting different data points in a fixed step length, building a simple model of the data points and the physical property indexes according to the different data points, and selecting the optimal number of the data points and a specific area according to the effect of the simple model;
and updating each data set according to the optimal data point number and the specific area.
7. The method as claimed in claim 1, wherein the constructing the PLS model, the ANN model, the KPLS model and the GA-KPLS model respectively based on the first data set and the predetermined initial model comprises:
constructing a PLS model from the first data set and the PLS initial model;
constructing an ANN model according to the first data set and the ANN initial model;
constructing a KPLS model according to the first data set and the KPLS initial model;
according to the constructed KPLS model, optimizing parameters of a kernel function introduced into the KPLS model by adopting a GA algorithm to obtain a GA-KPLS model;
and respectively constructing a PF-KPLS model, a RBF-KPLS model and an MIX-KPLS model according to the constructed KPLS model and by combining a preset kernel function.
8. The method according to claim 7, wherein the performing predictive verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model respectively according to the verification set, and determining the map predictive model of the target to be modeled from the verified models according to the verification result specifically comprises:
respectively inputting the spectral data graphs in the verification set into each constructed second model, and obtaining predicted values of each second model under different spectrograms; wherein the second model comprises: a PLS model, an ANN model, a KPLS model, a GA-KPLS model, a PF-KPLS model, a RBF-KPLS model, and a MIX-KPLS model;
comparing each predicted value with the assay analysis data in the verification set, and calculating the minimum average deviation value of each second model;
and selecting the second model with the minimum average deviation value as the map prediction model of the index to be modeled.
9. A model building device for nonlinear indexes of samples in petrochemical industry is characterized by comprising the following components: the system comprises a data acquisition module, a data processing module, a screening module, a model generation module and a model verification module;
the data acquisition module is used for selecting different analyzers according to the types of the samples to acquire a plurality of pieces of to-be-processed sample data of the to-be-modeled indexes; the to-be-processed sample data comprises a plurality of spectral data graphs and assay analysis data corresponding to each spectral data graph;
the data processing module is used for processing the sample data to be processed according to a plurality of preset spectrogram processing methods to obtain a plurality of data sets; wherein the data sets comprise a correction set and a verification set, and each data set comprises a plurality of processed sample data to be processed;
the screening module is used for screening out a first data set with the minimum average deviation value from the plurality of correction sets according to a preset KPLS modeling method;
the model generation module is used for generating a PLS model, an ANN model, a KPLS model and a GA-KPLS model according to a preset initial model and the first data set; wherein the initial models comprise a PLS initial model, an ANN initial model and a KPLS initial model; the GA-KPLS model is obtained by optimizing a constructed KPLS model through a GA algorithm;
and the model verification module is used for respectively performing prediction verification on the PLS model, the ANN model, the KPLS model and the GA-KPLS model according to the verification set, and determining a map prediction model of an index to be modeled from the verified models according to a verification result.
10. The apparatus for modeling nonlinear index of sample in petrochemical industry as claimed in claim 9, wherein said data processing module comprises: a spectrogram processing unit and a classification unit;
the spectrogram processing unit is used for processing the acquired sample data to be processed according to a plurality of preset spectrogram processing methods to obtain a plurality of data sets;
the classification unit is used for classifying the data into a correction set to be processed and a verification set to be processed according to a Kennard-Stone algorithm.
CN202210824506.9A 2022-07-14 2022-07-14 Model building method and device for nonlinear index of sample in petrochemical industry Pending CN115270611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210824506.9A CN115270611A (en) 2022-07-14 2022-07-14 Model building method and device for nonlinear index of sample in petrochemical industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210824506.9A CN115270611A (en) 2022-07-14 2022-07-14 Model building method and device for nonlinear index of sample in petrochemical industry

Publications (1)

Publication Number Publication Date
CN115270611A true CN115270611A (en) 2022-11-01

Family

ID=83765637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210824506.9A Pending CN115270611A (en) 2022-07-14 2022-07-14 Model building method and device for nonlinear index of sample in petrochemical industry

Country Status (1)

Country Link
CN (1) CN115270611A (en)

Similar Documents

Publication Publication Date Title
CN109597968B (en) SMT big data-based solder paste printing performance influence factor analysis method
CN109299501B (en) Vibration spectrum analysis model optimization method based on workflow
CN109324013B (en) Near-infrared rapid analysis method for constructing crude oil property by using Gaussian process regression model
CN109324014B (en) Self-adaptive near-infrared rapid prediction method for crude oil properties
CN107703097B (en) Method for constructing model for rapidly predicting crude oil property by using near-infrared spectrometer
CN114611582B (en) Method and system for analyzing substance concentration based on near infrared spectrum technology
CN115420707A (en) Sewage near infrared spectrum chemical oxygen demand assessment method and system
CN109283153B (en) Method for establishing quantitative analysis model of soy sauce
CN111259929A (en) Random forest based food-borne pathogenic bacteria classification model training method
CN108663334B (en) Method for searching spectral characteristic wavelength of soil nutrient based on multi-classifier fusion
CN112651173B (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
CN105954228A (en) Method for measuring content of sodium metal in oil sand based on near infrared spectrum
CN115270611A (en) Model building method and device for nonlinear index of sample in petrochemical industry
CN113295674B (en) Laser-induced breakdown spectroscopy characteristic nonlinear processing method based on S transformation
CN113793652A (en) Spectrogram chemometrics analysis method based on segmented intelligent optimization
CN113567417A (en) Method for identifying peanut oil production place based on Raman spectrum fingerprint analysis technology
CN113702328A (en) Method, device, equipment and storage medium for analyzing properties of product oil
WO2024011687A1 (en) Method and apparatus for establishing oil product physical property fast evaluation model
CN116148212B (en) Method for rapidly determining types and contents of clay minerals in ores based on near infrared spectrum analysis
CN117929356B (en) LIBS quantitative analysis method based on Gaussian process regression
CN117871459A (en) Mutton crude fat content determination method and system
CN118294407A (en) Near infrared spectrum modeling sample screening method
CN114414524A (en) Method for rapidly detecting properties of aviation kerosene
CN104502305B (en) Near infrared spectrum useful information distinguishing method based on wavelet transform
CN112326594A (en) Method for establishing quantitative model for rapidly detecting sulfur content in C5 oil product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination