CN106934416B - Big data-based model matching method - Google Patents

Big data-based model matching method Download PDF

Info

Publication number
CN106934416B
CN106934416B CN201710102144.1A CN201710102144A CN106934416B CN 106934416 B CN106934416 B CN 106934416B CN 201710102144 A CN201710102144 A CN 201710102144A CN 106934416 B CN106934416 B CN 106934416B
Authority
CN
China
Prior art keywords
model
instrument
spectrum
found
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710102144.1A
Other languages
Chinese (zh)
Other versions
CN106934416A (en
Inventor
刘彤
潘涛
曾永平
肖青青
沈鸿平
凌亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Zhongtaxun Technology Co.,Ltd.
Original Assignee
Guangzhou Sondon Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Sondon Network Technology Co ltd filed Critical Guangzhou Sondon Network Technology Co ltd
Priority to CN201710102144.1A priority Critical patent/CN106934416B/en
Publication of CN106934416A publication Critical patent/CN106934416A/en
Application granted granted Critical
Publication of CN106934416B publication Critical patent/CN106934416B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention discloses a model matching method based on big data, which comprises the following steps: acquiring instrument parameters of new equipment, a spectrum to be measured and/or an average spectrum of a standard substance, and matching a most accurate and applicable analysis model from a basic model library and an individualized model according to the acquired instrument parameters, the spectrum to be measured and/or the average spectrum of the standard substance to predict the result of the material information; the basic model library is built based on an instrument population clustering mode. The method of the invention can solve the problem of inter-platform difference among a large number of instruments, and the model can be transmitted among large-batch equipment, thereby breaking the limitation of the number of near infrared spectrum instruments, being suitable for the mass production mode of the infrared spectrum instruments and being beneficial to the popularization and the application of the near infrared spectrum technology. The model matching method based on big data can be widely applied to the field of model matching.

Description

Big data-based model matching method
Technical Field
The invention relates to a model matching technology, in particular to a model matching method based on a big data mining technology.
Background
Explanation of technical words:
near infrared light: the near infrared light (NIR) is an electromagnetic wave between ultraviolet-visible light (UV-Vis) and mid-infrared light (MIR), and the wavelength range of the NIR is 700-2500 nm; the near infrared spectrum can reflect the frequency doubling and frequency combination absorption of the vibration of hydrogen-containing groups X-H (such as C-H, N-H, O-H and the like), and the near infrared absorption wavelengths and intensities of different groups (such as methyl, methylene, benzene rings and the like) or the same group in different chemical environments are obviously different, so the near infrared spectrum is very suitable for measuring the physical and chemical parameters of hydrogen-containing organic substances.
Problem of inter-stage difference: due to the problems of manufacturing process (difference between stations caused by slight difference of manufacturing process of instruments in the same batch or instruments in different batches), environment (different results are obtained for the same sample due to influence of current environment on the instruments, such as temperature, humidity and the like), instrument loss (difference between stations caused by service life and use loss of the instruments), and the like, the measured data of the same sample of products in the same batch are different, so that an analysis model established by one instrument cannot be directly used on other instruments.
A model transfer party: the instrument with the established analysis model is used as a host, the instrument needing to use the analysis model is used as a slave, the host and the slave are respectively used for scanning the same sample or standard substance and establishing a correction model, and the slave can correct the detection spectrum through the correction model and then use the analysis model of the host or directly use the analysis model of the host and then correct the prediction result.
Based on modern chemometric methods, near infrared spectroscopy can be used for both quantitative and qualitative analysis. The chemometric method in quantitative analysis and qualitative analysis mainly comprises the following aspects: 1. spectrum pretreatment and variable selection; 2. establishing an analysis model for predicting the properties or the compositions of unknown samples; 3. a pattern recognition method and a model outlier detection method; 4. and (4) a model transfer method. At present, because most of organizations or individuals using the technology only use a single or a small number of near infrared spectrometers to establish an analysis model, the problem of inter-station difference is usually solved by adopting a model transfer method in modern chemometrics.
At present, two types of model transfer methods for solving the problem of inter-station difference mainly include that firstly, the robustness of a correction model is improved, and secondly, the adaptability of the correction model is enhanced. The former method mainly adopts preprocessing methods such as variable screening, differentiation, orthogonal signal correction and the like, expands a correction model under different environmental measurement conditions, filters noise information in a spectrum by adopting modes such as robust regression and the like, fuses a plurality of local models, and improves the anti-noise capability of the model so as to achieve the purpose of higher reliability and robustness of the correction model. The latter method is to establish the functional relationship between the measured spectra of the slave and the master, the model parameters or the prediction results by a mathematical method, thereby realizing model transfer. Since the former method is only some common data and processing methods and cannot achieve high accuracy, the latter method, such as the classical Shenk's algorithm, is mainly adopted to solve the problem of inter-station difference. However, there are still some disadvantages to the currently common model delivery methods, such as: 1. the correction calculation amount is too much, and the transfer of a large number of models cannot be realized; 2. a large number of calibration samples are required to support model transfer; 3. the model is fixed once the instrument is corrected due to lack of dynamic change, and the instrument can lead the model to be no longer accurate along with the consumption of time and needs dynamic update; 4. the user participation is low, and the relationship between the user and the merchant is only limited to the trading relationship. Therefore, when the quantity of instruments increases sharply, the model transmission method is difficult to implement, the problems that the workload of workers is large, the dynamic performance is poor and the like are caused by the fact that the calculated quantity is too large and the load cannot be applied, the quantity of required correction samples is too large, and a large amount of spectral data needs to be collected for each instrument to be modeled, are easily generated, and the problem that the inter-platform difference among a large number of instruments cannot be basically solved, so that the near infrared spectrum technology cannot be popularized and used in a large scale.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a model matching method based on big data, which can break the limitation of the quantity of near infrared spectroscopy instruments and is suitable for mass production modes of the infrared spectroscopy instruments, thereby being beneficial to popularization and application of the near infrared spectroscopy technology.
The first technical scheme adopted by the invention is as follows: a big data-based model matching method comprises the following steps:
acquiring instrument parameters of new equipment;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
after the new equipment obtains the basic model corresponding to the found instrument type from the basic model library, the obtained basic model is utilized to carry out result prediction processing on material information;
the basic model library is built based on an instrument population clustering mode.
As a preferred embodiment of the first technical solution, the establishing step of the base model library includes:
randomly sampling all equipment of the same instrument type which leaves a factory, and collecting instrument parameters of the extracted equipment;
performing cluster analysis on the acquired instrument parameters to obtain a plurality of data clusters, wherein one data cluster is represented as an instrument category, and a cluster center of the data cluster is represented as a first category characteristic corresponding to the instrument category;
and establishing a corresponding basic model for each instrument type, wherein all the established basic models form a basic model library.
As a preferred implementation of the first technical solution, the establishing of the corresponding basic model for the instrument category specifically includes the steps of:
selecting equipment belonging to the same instrument category to perform spectrum collection of known substance information samples, and then creating an analysis model corresponding to the instrument category according to collected spectrum data through a modeling algorithm in chemometrics.
The second technical scheme adopted by the invention is as follows: a big data-based model matching method comprises the following steps:
acquiring instrument parameters of new equipment;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
judging whether an individualized model corresponding to the instrument type exists in an individualized model base or not according to the found instrument type, if so, finding an individualized model corresponding to the instrument type from the individualized model base, then, matching and comparing the obtained instrument parameters with instrument characteristics stored in the found individualized model, thereby finding out an individualized model most matched with the new equipment, and then, utilizing the found individualized model to carry out result prediction processing on the material information; otherwise, after obtaining a basic model corresponding to the found instrument type from the basic model library, performing result prediction processing on the material information by using the obtained basic model;
the basic model library is built based on an instrument population clustering mode.
As a preferred embodiment of the second technical solution, the personalized model stores model rules and instrument features.
The third technical scheme adopted by the invention is as follows: a big data-based model matching method comprises the following steps:
acquiring instrument parameters and a spectrum to be measured of new equipment;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the spectrum to be tested is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the spectrum to be tested are found out, and the most matched analysis model is obtained;
performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
As a preferred embodiment of the third technical solution, the step of finding out an analysis model corresponding to the instrument type from the base model library and the personalized model according to the found instrument type, then matching and comparing the spectrum to be measured and the spectral features corresponding to the found analysis model, and finding out the spectral features most similar to the spectrum to be measured, thereby obtaining the most matched analysis model specifically includes:
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the spectrum to be tested is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the spectrum to be tested are found out, so that the analysis model corresponding to the found most similar spectral characteristics is obtained, then the obtained instrument parameters are matched and compared with the instrument characteristics corresponding to the obtained analysis model, so that the instrument characteristics most matched with the instrument parameters are found out, and the most matched analysis model is obtained.
As a preferred embodiment of the third technical means, the extracting step of the spectral feature corresponding to the analysis model includes:
spectral data in a modeling set, which is a modeling data set used to build an analytical model, is analyzed to extract spectral features.
The fourth technical scheme adopted by the invention is as follows: a big data-based model matching method comprises the following steps:
acquiring instrument parameters of the new equipment, a spectrum to be detected and an average spectrum of a standard substance;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
analyzing the obtained average spectrum of the standard substance and the second class characteristics corresponding to the instrument class so as to obtain a correction coefficient;
correcting the spectrum to be measured by using the calculated correction coefficient so as to obtain a corrected spectrum to be measured;
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the corrected spectrum to be detected is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the corrected spectrum to be detected are found out, and the most matched analysis model is obtained;
performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
As a preferred implementation manner of the fourth technical solution, the extracting step of the second category feature corresponding to the instrument category includes:
carrying out spectrum collection on a standard substance by equipment belonging to the same instrument category;
and calculating the average spectrum of the collected spectrum data of the standard product, wherein the calculated average spectrum is used as a second class characteristic corresponding to the instrument class.
The invention has the beneficial effects that: by using the model matching method, the optimal analysis model can be matched for new equipment, and a large number of samples and a large number of spectral data acquisition and calculation are not needed.
Drawings
FIG. 1 is a schematic diagram of a process for building a base model library;
FIG. 2 is a schematic diagram of a model matching process based on a base model library;
FIG. 3 is a schematic diagram of a process for creating a personalized model;
FIG. 4 is a schematic diagram of a model matching process based on a base model library and a personalized model library;
FIG. 5 is a schematic diagram of a model matching process based on a base model library, a personalized model library, and spectral features;
FIG. 6 is a schematic diagram of a model matching process based on a base model library, a personalized model library, spectral features and instrument features;
FIG. 7 is a flowchart illustrating the steps of extracting the second class features corresponding to the instrument class;
FIG. 8 is a schematic diagram of a model matching process based on a base model library, a personalized model library, spectral features and an average spectrum.
Detailed Description
The device in this embodiment is referred to as a near infrared spectrometer. The analytical models in the base model library are referred to as base models, and the analytical models in the personalized model library are referred to as personalized models.
Example 1 building of base model library
For the basic model library, the establishing step comprises the following steps:
s101, randomly sampling all equipment of the same instrument type which leaves a factory, and collecting instrument parameters of the extracted equipment, wherein the instrument parameters to be collected comprise light source wavelength, light source luminous power, light source driving current, white board reflectivity, detection dark current and the like;
s102, clustering analysis is carried out on the collected instrument parameters by adopting a kmeans algorithm, so that a plurality of data clusters are obtained, wherein one data cluster is represented as an instrument type, and a clustering center of the data cluster is represented as a first type characteristic corresponding to the instrument type;
for the step S102, it specifically includes:
s1021, initializing, and setting coordinates of k initial clustering centers;
s1022, calculating the distance between each data point (one data point corresponds to represent an instrument object, and one instrument object comprises instrument parameters of the equipment) and each clustering center, and correspondingly dividing each data point into data clusters to which the clustering center closest to the data point belongs according to the calculated distance so as to obtain a plurality of data clusters, namely k data clusters;
s1023, judging whether the data cluster obtained by current calculation meets the clustering end requirement, if so, judging whether the clustering center of the data cluster obtained by current calculation is equal to the clustering center of the data cluster obtained by previous calculation or the difference is less than a certain threshold value, if so, ending, wherein the plurality of data clusters obtained currently are a plurality of data clusters obtained by required calculation; otherwise, recalculating the coordinates of the clustering center of each data cluster, and then returning to execute the previous step S1022;
in step S1023, the calculation formula of the coordinates of the cluster center is as follows:
Figure DEST_PATH_GDA0001287594540000051
c aboveijExpressed as coordinates of the jth dimension of the ith cluster center,
Figure DEST_PATH_GDA0001287594540000052
representing the coordinates of the kth data point belonging to the ith data cluster in the jth dimension;
s103, calculating a clustering center of each data cluster, namely the characteristics of each instrument category, by using the instrument population clustering technology realized in the steps S101 and S102, wherein for each instrument category, the instrument population clustering technology respectively and correspondingly comprises a series of equipment similar to the first characteristics of the instrument category, namely the series of equipment all belong to the instrument category, at the moment, a corresponding basic model can be established for each instrument category, and all the established basic models form a basic model library;
the method comprises the following steps of establishing a corresponding basic model for the instrument category, wherein the specific steps comprise:
selecting all devices or partial devices belonging to the same instrument category to perform spectrum acquisition of known substance information samples, and then establishing an analysis model corresponding to the instrument category according to acquired spectrum data through a modeling algorithm in chemometrics, wherein the established analysis model is a basic model to be established. And then, storing information such as the first class characteristics corresponding to the instrument class, the analysis model, the instrument characteristics corresponding to the analysis model, the modeling set and the like in a cloud.
In this embodiment, the specific flow of the basic model library is shown in fig. 1 (the sample takes soybean meal and corn as examples, and the indexes take moisture content and protein content as examples):
firstly, collecting instrument parameters of equipment 1, equipment 2, equipment 3, … … and equipment N; then, clustering analysis is carried out on the acquired instrument parameters by adopting a kmeans algorithm, so that a plurality of different instrument categories and first category characteristics corresponding to the instrument categories, such as a characteristic a of the category A, characteristics B and … … of the category B and a characteristic K of the category K, are obtained; then, selecting equipment belonging to the same instrument category to perform spectrum collection of the known substance information sample; then, through a modeling algorithm in chemometrics, an analysis model corresponding to the instrument class, such as the feature a of the class a, is created according to the collected spectral data, and the analysis model corresponds to the analysis model including an analysis model for corn moisture, an analysis model for corn protein and an analysis model for soybean meal protein.
For the above-mentioned basic model library establishing procedure, it is applicable to all the following embodiments.
Embodiment 2 model matching method based on basic model library
In the detection stage, the method for matching the model of the new device with the established basic model library in the embodiment 1 comprises the following steps:
s201, acquiring instrument parameters of new equipment;
s202, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class, such as class B, closest to the new equipment;
s203, after the new equipment obtains the basic model corresponding to the found instrument type B from the basic model library, the obtained basic model is utilized to carry out result prediction processing on material information;
the basic model library is built based on an instrument population clustering mode.
For the above model matching method, the specific flow is shown in fig. 2 (the sample takes soybean meal and corn as examples, and the indexes take moisture content and protein content as examples):
firstly, acquiring instrument parameters of new equipment, and then comparing the acquired instrument parameters with first class characteristics corresponding to each instrument class, thereby finding out an instrument class B closest to the new equipment; and then, when the new equipment detects the corn and the bean pulp, respectively carrying out spectrum acquisition on the corn and the bean pulp, then, obtaining an analysis model corresponding to the instrument type B from the basic model library, and then, carrying out result prediction processing on the acquired spectrum information by using the obtained analysis model.
Embodiment 3 model matching method based on basic model library and personalized model library
The creation and maintenance of the analytical models are continuously performed during the use of the device by the user, and currently, for these newly created analytical models, it can not realize sharing, therefore, in order to realize the sharing of the newly-built analysis models and break the situation of resource island, the invention combines the concept of the Internet to upload the newly-built analysis models and related data to the cloud, for example, each time a user creates or updates an analytical model, the system automatically uploads new analytical models and related data that the user would like to share to the cloud and stores them in the corresponding instrument class of the device used by the user, e.g., if the device used by the user belongs to the instrument class a, the analysis model and related data created or updated by the device are uploaded to the cloud and stored in the instrument class a. At this time, for the analysis models created or updated in the process of using the device by the user, the analysis models are stored in the cloud as personalized models, and all the personalized models are constructed into a personalized model library. In addition, the personalized model stores not only the model rules but also a series of instrument characteristics, and correspondingly stores spectral information, algorithm information, and the like.
In this embodiment, for the establishment of the personalized model library, the specific flow is shown in fig. 3 (the sample takes soybean meal and corn as examples, and the indexes take moisture content, protein content and fiber content as examples):
the newly-built analysis models, instrument characteristics and sample spectrum information are uploaded to the cloud by the equipment 1 and the equipment 2 and are stored in corresponding instrument categories to be used as personalized models, the newly-built analysis models of the equipment 1 for zein are respectively 002 and 004, the newly-built analysis models of the equipment 2 for soybean meal fibers are 003, and when each personalized model is stored, the correspondingly-stored information comprises model rules (wherein the model rules comprise initialization coefficients, constant terms, product coefficients, average spectra, preprocessing algorithms and regression algorithms), instrument characteristics (wherein the instrument characteristics comprise wavelengths, driving currents, light source temperatures, environment humidities, detector temperatures and reflectivity), sample spectrum information and the like.
In the detection stage, for the established personalized model and the established basic model library in embodiment 1, when the new device performs model matching, the method includes the following steps:
s301, acquiring instrument parameters of new equipment;
s302, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class, such as class B, closest to the new equipment;
s303, judging whether an individualized model corresponding to the instrument type B exists in an individualized model base or not according to the found instrument type B, if so, finding out the individualized model corresponding to the instrument type B from the individualized model base, then, matching and comparing instrument parameters of the new equipment with instrument characteristics stored in the found individualized model, thereby finding out an individualized model most matched with the new equipment, and then, calling a model rule of the individualized model by using the found individualized model, thereby carrying out result prediction processing on the material information; otherwise, after obtaining the basic model corresponding to the found instrument type B from the basic model library, utilizing the obtained basic model to carry out result prediction processing on the material information;
the basic model library is built based on an instrument population clustering mode.
For the above model matching method, the specific flow is shown in fig. 4 (the sample is corn for example, and the index is protein content for example):
firstly, acquiring instrument parameters of new equipment, and then comparing the acquired instrument parameters with first class characteristics corresponding to each instrument class, thereby finding out an instrument class B closest to the new equipment; then, according to the found instrument class B, it is determined whether an individualized model corresponding to the instrument class B exists in the individualized model library, if so, an individualized model corresponding to the instrument class B is found from the individualized model library, then, the instrument parameters of the new device are matched and compared with the instrument characteristics stored in the found individualized model, so as to find an individualized model most matched with the new device, in this embodiment, the found individualized model is an individualized model 003 for zein, and then, the found individualized model is utilized to call a prediction function corresponding to a model rule of the individualized model, so as to perform result prediction processing on the spectral information of the collected sample, such as the spectral information of corn.
Embodiment 4 model matching method based on basic model library, personalized model library and spectral characteristics
The spectral features are added to realize model matching, so that when the number of analysis models in the basic model library and the personalized model library is increased, the optimal analysis model can be better matched and found.
Because the spectral features are required to be used for model matching, in the process of establishing the analysis models in the basic model library and the personalized model library, a spectral feature extraction step is added, specifically, the extraction step of the spectral features corresponding to the analysis models comprises the following steps: and analyzing the spectral data concentrated by modeling so as to extract corresponding spectral characteristics, wherein the extracted spectral characteristics are stored in the cloud in one-to-one correspondence with the established analysis model. For example, when the device 1 is used, an analysis model O is created, and data used for creating the analysis model O at this time forms a modeling data set, which is called a modeling set, and then spectral data in the modeling set is analyzed to extract a corresponding spectral feature O, and then the analysis model O and the corresponding spectral feature O are uploaded to a cloud storage and stored in an instrument type corresponding to the device 1.
In the detection stage, aiming at the spectral features corresponding to the analysis model, the personalized model established in the embodiment 3 and the basic model library established in the embodiment 1, when the new device is matched with the model, the method comprises the following steps:
s401, acquiring instrument parameters of new equipment and a spectrum to be measured;
s402, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class, such as class B, closest to the new equipment;
s403, according to the found instrument class B, finding out an analysis model corresponding to the instrument class B from the basic model library and the personalized model, then matching and comparing the spectrum to be detected with the spectrum characteristics corresponding to the found analysis model, and finding out the spectrum characteristics most similar to the spectrum to be detected, so as to obtain the most matched analysis model, wherein at the moment, the analysis model corresponding to the found most similar spectrum characteristics is the most matched analysis model;
s404, performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
In addition, for the step of finding out the most similar spectral feature to the spectrum to be measured, the formula of the modified cosine similarity algorithm is as follows:
Figure DEST_PATH_GDA0001287594540000091
the above U is expressed as a feature set, U is expressed as information represented by a single feature dimension in U, i is expressed as a spectral feature set of a sample to be tested when new equipment is detected, i is an spectrum to be tested, j is expressed as a spectral feature corresponding to each analysis model to be compared, i is a feature set of a modeling spectrum set in the analysis model, and sim (i, j) is expressed as a comparison result between the spectrum to be tested and the spectral feature corresponding to each analysis model to be compared.
For the above model matching method, the specific flow is shown in fig. 5 (the sample is corn for example, and the index is protein content for example):
firstly, acquiring instrument parameters of new equipment and a spectrum to be measured; then, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class B closest to the new equipment; then, according to the found instrument class B, an analysis model corresponding to the instrument class B is found out from the basic model library and the personalized model, then the spectrum to be tested is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the spectrum to be tested are found out, and the most matched analysis model is obtained; at this time, the analysis model corresponding to the found most similar spectral feature is the most matched analysis model, and in this embodiment, the found most matched analysis model is the personalized model 007 corresponding to the instrument class B; finally, the most matched analysis model, namely the personalized model 007 is used for calling a prediction function corresponding to the model rule of the personalized model 007, so that the result of the material information is carried out on the unknown spectrum X.
Embodiment 5 model matching method based on basic model library, personalized model library, spectral characteristics and instrument characteristics
The model matching method of the embodiment comprehensively uses the instrument population clustering technology, the personalized model technology and the spectral feature matching technology, and further combines the instrument features to realize multi-index prediction of unknown samples.
In the detection stage, aiming at the spectral characteristics and the instrument characteristics corresponding to the analysis model, the personalized model established in the embodiment 3 and the basic model library established in the embodiment 1, when the new equipment is subjected to model matching, the method comprises the following steps:
s501, acquiring instrument parameters of new equipment and a spectrum to be measured;
s502, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class, such as class B, closest to the new equipment;
s503, according to the found instrument class B, finding out an analysis model corresponding to the instrument class B from the basic model library and the personalized model, then matching and comparing the spectrum to be detected with the spectral characteristics corresponding to the found analysis model, finding out the spectral characteristics most similar to the spectrum to be detected, thereby obtaining an analysis model corresponding to the found most similar spectral characteristics, then matching and comparing the instrument parameters of the new equipment with the instrument characteristics corresponding to the obtained analysis model (i.e. the analysis model corresponding to the found most similar spectral characteristics), thereby finding out the instrument characteristics most matched with the instrument parameters of the new equipment, thereby obtaining the most matched analysis model, wherein at the moment, the analysis model corresponding to the found most matched instrument characteristics is the most matched analysis model;
s504, performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
For the above model matching method, the specific flow is shown in fig. 6 (the sample takes corn as an example, and the indexes take moisture content and protein content as examples):
firstly, acquiring instrument parameters of new equipment (unknown equipment) and a spectrum to be measured (unknown spectrum); then, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class B closest to the new equipment; then, according to the found instrument class B, finding out an analysis model corresponding to the instrument class B from the basic model library and the personalized model, then matching and comparing the spectrum to be detected with the spectral features corresponding to the found analysis model, finding out the spectral features most similar to the spectrum to be detected, thereby obtaining the analysis model corresponding to the found most similar spectral features, where the found analysis models are analysis models matched with available models, in this embodiment, the analysis models matched with the found available models are analysis models 005 and 009 for zein, and analysis models 001, 005 and 010 for corn moisture; then, matching and comparing the instrument parameters of the new device with the instrument characteristics corresponding to the obtained analysis models, so as to find out the instrument characteristics which are most matched with the instrument parameters of the new device, thereby obtaining the most matched analysis models, wherein in the embodiment, the found most matched analysis models are the analysis model 009 for zein and the analysis model 005 for corn moisture, respectively; and finally, respectively calling the prediction functions corresponding to the model rules of the two analysis models by using the two analysis models, thereby obtaining the result of the material information of the unknown spectrum X.
As can be seen from the above, by using the model matching method of the present embodiment, the most suitable model for the sample to be tested can be further found out through multi-aspect feature matching, so as to greatly improve the accuracy of the predicted value. Through the technology, a supplier only needs to establish a basic model for a part of the instruments leaving a factory, the supplier and a user can continuously fill the personalized model into the cloud in the using process of the instruments, and the prediction precision and the matching effect of the model are continuously improved along with the gradual increase of the personalized model, so that the benign cooperation cycle of a buyer and a seller is realized.
Embodiment 6 model matching method based on basic model library, personalized model library, spectral features and average spectrum
In this embodiment, model transfer techniques in modern chemometrics are also organically combined, thereby further improving the accuracy of the prediction results. The average spectrum is used for model matching, so after the classification of the instrument is finished, that is, after step S102, a step of collecting a spectrum of a standard product (one of instrument accessories) is added, so as to obtain an average spectrum of the standard product corresponding to each instrument class.
For the second class feature corresponding to the instrument class, the extracting step S104 includes: after step S102, different instrument categories are divided, then spectrum collection of a standard substance is performed on all devices belonging to the same instrument category or on part of representative devices, then average spectrum calculation is performed on collected spectrum data of the standard substance, the calculated average spectrum is used as a second category characteristic corresponding to the instrument category, and finally the average spectrum corresponding to the instrument category is uploaded to a cloud for corresponding storage; for example, after the classification of the instrument types is finished, 10 instrument types are totally classified, and the first instrument type (i.e., type 1) includes 30 devices, at this time, spectrum collection of the standard is performed on the 30 devices or partially representative devices, such as 15 near-infrared spectrometers, that is, the 30 or 15 near-infrared spectrometers perform spectrum collection on the standard, then, average spectrum calculation is performed according to the collected spectrum data of the standard, and at this time, the calculated average spectrum is taken as a second type feature corresponding to the first instrument type (i.e., type 1). It can be seen that step S104 is provided after step S102.
The specific flow of the above step S104 is shown in fig. 7:
firstly, after performing cluster analysis according to steps S101 and S102, N instrument categories, i.e., category 1 and category 2 … … and category N, are divided, then spectrum collection is performed on the standard product by the device belonging to the same instrument category, and then calculation of the average spectrum is performed on the collected spectrum data of the standard product, so that the average spectrum corresponding to category 1 is the average spectrum 1, and the average spectrum 1 is the second category characteristic corresponding to category 1.
In the detection stage, aiming at the average spectrum corresponding to the instrument category, the spectrum characteristics corresponding to the analysis model, the personalized model established in the embodiment 3 and the basic model library established in the embodiment 1, when the new device is subjected to model matching, the method comprises the following steps:
s601, the new equipment scans the standard for multiple times (at least 3 times) to obtain the average spectrum of the new equipment to the standard, namely, the new equipment scans the standard for multiple times to obtain multiple spectrum data, and the obtained average value is the average spectrum of the new equipment to the standard;
s602, acquiring instrument parameters of new equipment and a spectrum to be measured;
s603, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, so as to find out the instrument class, such as class B, closest to the new equipment;
s604, analyzing the average spectrum of the new equipment to the standard substance and the second characteristic corresponding to the instrument type B, namely the average spectrum of the standard substance corresponding to the instrument type B, so as to obtain a correction coefficient by using methods (such as a Shenk' S algorithm, a direct correction algorithm DS and a segmented direct correction algorithm PDS) in modern chemometrics;
s605, correcting the spectrum to be measured by using the obtained correction coefficient to obtain a corrected spectrum to be measured;
s606, according to the found type B, finding out an analysis model corresponding to the instrument type B from the basic model base and the personalized model, then, matching and comparing the corrected spectrum to be detected with the spectral characteristics corresponding to the found analysis model, and finding out the spectral characteristics most similar to the corrected spectrum to be detected, so as to obtain the most matched analysis model, wherein at the moment, the analysis model corresponding to the found most similar spectral characteristics is the most matched analysis model;
s607, performing result prediction processing of the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
For the above model matching method, the specific flow is shown in fig. 8:
firstly, acquiring an average spectrum, instrument parameters and a spectrum to be measured of a standard substance of new equipment; then, comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, and finding out the instrument class 2 closest to the new equipment; then, analyzing the average spectrum of the standard product of the new equipment and the average spectrum 2 corresponding to the instrument type 2 to obtain a correction coefficient; then, correcting the spectrum to be measured by using the obtained correction coefficients, wherein the spectrum to be measured respectively comprises an unknown spectrum a acquired by the new sample A, an unknown spectrum B acquired by the new sample B and an unknown spectrum C acquired by the new sample C, so as to obtain corrected spectrum to be measured, namely a corrected spectrum A, a corrected spectrum B and a corrected spectrum C; then, according to the found type 2, an analysis model corresponding to the instrument type 2 is found out from the basic model base and the personalized model, then, the corrected spectrum is matched and compared with the spectral characteristics corresponding to the found analysis model, and the spectral characteristics most similar to the corrected spectrum are found out, so that the most matched analysis models, namely an analysis model 2A, an analysis model 2B and an analysis model 2C are obtained; and finally, calling a prediction function corresponding to the model rule of the best matching analysis model to correspondingly perform result prediction processing of the material information on the corrected spectrum.
From the above, the model matching method of the present invention has the following advantages:
1. the invention can meet the requirement of automation of greater intensity in the model transferring process, and realizes rapid and accurate model transferring
In the matching method, the personalized transfer model can be established according to the characteristics and environment of the instrument, the flexibility in the model transfer process is met, the transfer of limited equipment and limited models is broken through, and a large number of models are transferred quickly and accurately;
2. dynamic update of a model
According to the matching method, a dynamic model of the instrument in the whole life cycle can be established according to the self condition of the instrument and the feedback information of the user, so that the instrument becomes an ecological model system which is continuously improved, the flexibility is high, and the accuracy and the applicability of model matching can be improved;
3. meet the requirements of user participation and personalized model customization
The traditional transient business relation between an enterprise and a user is broken through, an online platform and an offline platform are integrated by taking an instrument as a channel, long-term association with the user is established, the requirement of user participation is met by taking the instrument as a bridge, and a basis is provided for model updating and personalized model customization through data of interaction between the user and the instrument.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A big data-based model matching method is characterized in that: the method comprises the following steps:
acquiring instrument parameters of new equipment;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
judging whether an individualized model corresponding to the instrument type exists in an individualized model base or not according to the found instrument type, if so, finding an individualized model corresponding to the instrument type from the individualized model base, then, matching and comparing the obtained instrument parameters with instrument characteristics stored in the found individualized model, thereby finding out an individualized model most matched with the new equipment, and then, utilizing the found individualized model to carry out result prediction processing on the material information; otherwise, after obtaining a basic model corresponding to the found instrument type from the basic model library, performing result prediction processing on the material information by using the obtained basic model;
the basic model library is built based on an instrument population clustering mode.
2. The big data-based model matching method according to claim 1, wherein: the personalized model stores model rules and instrument characteristics.
3. A big data-based model matching method is characterized in that: the method comprises the following steps:
acquiring instrument parameters and a spectrum to be measured of new equipment;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the spectrum to be tested is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the spectrum to be tested are found out, and the most matched analysis model is obtained;
performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
4. The big data-based model matching method according to claim 3, wherein: the step of finding out an analysis model corresponding to the instrument type from the basic model library and the personalized model according to the found instrument type, then matching and comparing the spectrum to be detected with the spectrum characteristics corresponding to the found analysis model, and finding out the spectrum characteristics most similar to the spectrum to be detected so as to obtain the most matched analysis model specifically comprises the following steps:
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the spectrum to be tested is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the spectrum to be tested are found out, so that the analysis model corresponding to the found most similar spectral characteristics is obtained, then the obtained instrument parameters are matched and compared with the instrument characteristics corresponding to the obtained analysis model, so that the instrument characteristics most matched with the instrument parameters are found out, and the most matched analysis model is obtained.
5. The big data-based model matching method according to claim 3 or 4, wherein: the extraction step of the spectral feature corresponding to the analysis model comprises the following steps:
spectral data in a modeling set, which is a modeling data set used to build an analytical model, is analyzed to extract spectral features.
6. A big data-based model matching method is characterized in that: the method comprises the following steps:
acquiring instrument parameters of the new equipment, a spectrum to be detected and an average spectrum of a standard substance;
comparing the acquired instrument parameters with the first class characteristics corresponding to each instrument class, thereby finding out the instrument class closest to the new equipment;
analyzing the obtained average spectrum of the standard substance and the second class characteristics corresponding to the instrument class so as to obtain a correction coefficient;
correcting the spectrum to be measured by using the calculated correction coefficient so as to obtain a corrected spectrum to be measured;
according to the found instrument type, an analysis model corresponding to the instrument type is found out from the basic model base and the personalized model, then the corrected spectrum to be detected is matched and compared with the spectral characteristics corresponding to the found analysis model, the spectral characteristics most similar to the corrected spectrum to be detected are found out, and the most matched analysis model is obtained;
performing result prediction processing on the material information by using the most matched analysis model;
the basic model library is built based on an instrument population clustering mode.
7. The big data-based model matching method according to claim 6, wherein: the extraction step of the first class characteristic corresponding to the instrument class comprises the following steps:
carrying out spectrum collection on a standard substance by equipment belonging to the same instrument category;
and carrying out average spectrum calculation on the collected spectrum data of the standard substance, and taking the calculated average spectrum as a second class characteristic corresponding to the instrument class.
CN201710102144.1A 2017-02-23 2017-02-23 Big data-based model matching method Active CN106934416B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710102144.1A CN106934416B (en) 2017-02-23 2017-02-23 Big data-based model matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710102144.1A CN106934416B (en) 2017-02-23 2017-02-23 Big data-based model matching method

Publications (2)

Publication Number Publication Date
CN106934416A CN106934416A (en) 2017-07-07
CN106934416B true CN106934416B (en) 2021-03-30

Family

ID=59424576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710102144.1A Active CN106934416B (en) 2017-02-23 2017-02-23 Big data-based model matching method

Country Status (1)

Country Link
CN (1) CN106934416B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228533A (en) * 2018-01-04 2018-06-29 中煤航测遥感集团有限公司 Materials analysis methods and device
CN108777019B (en) * 2018-04-28 2021-01-05 深圳市芭田生态工程股份有限公司 Near-infrared spectrum model transfer strategy optimization method and device
CN111239054A (en) * 2018-11-28 2020-06-05 中移物联网有限公司 Spectral analysis model application method and device
CN112629659A (en) * 2019-10-08 2021-04-09 中强光电股份有限公司 Automated model training apparatus and automated model training method for training pipelines for different spectrometers
CN112633307A (en) * 2019-10-08 2021-04-09 中强光电股份有限公司 Automatic model training device and automatic model training method for spectrometer
CN110893100A (en) * 2019-12-16 2020-03-20 广东轻工职业技术学院 Device and method for monitoring posture change based on plantar pressure sensor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101832922A (en) * 2010-05-19 2010-09-15 中国农业大学 Method for transferring near infrared model of organic fertilizer product
CN104915375A (en) * 2012-02-28 2015-09-16 艾康生物技术(杭州)有限公司 Method for automatically identifying identity information of biosensor
CN105181650A (en) * 2015-10-08 2015-12-23 滁州职业技术学院 Method for quickly identifying tea varieties through near-infrared spectroscopy technology
US20160275375A1 (en) * 2015-03-20 2016-09-22 Netra, Inc. Object detection and classification
CN106096563A (en) * 2016-06-17 2016-11-09 深圳市易特科信息技术有限公司 Plant automatic recognition system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101832922A (en) * 2010-05-19 2010-09-15 中国农业大学 Method for transferring near infrared model of organic fertilizer product
CN104915375A (en) * 2012-02-28 2015-09-16 艾康生物技术(杭州)有限公司 Method for automatically identifying identity information of biosensor
US20160275375A1 (en) * 2015-03-20 2016-09-22 Netra, Inc. Object detection and classification
CN105181650A (en) * 2015-10-08 2015-12-23 滁州职业技术学院 Method for quickly identifying tea varieties through near-infrared spectroscopy technology
CN106096563A (en) * 2016-06-17 2016-11-09 深圳市易特科信息技术有限公司 Plant automatic recognition system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
近红外光谱分析技术在饲料行业中的应用;鲁雄等;《安徽农业科学》;20161231;第44卷(第2期);第83-85页 *
近红外光谱分析方法研究:从传统数据到大数据;刘言等;《中国科学》;20151231;第60卷(第8期);第704-713页 *

Also Published As

Publication number Publication date
CN106934416A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
CN106934416B (en) Big data-based model matching method
CN106815643B (en) Infrared spectroscopy Model Transfer method based on random forest transfer learning
Bouveresse et al. Standardization of near-infrared spectrometric instruments
Kuang et al. Calibration of visible and near infrared spectroscopy for soil analysis at the field scale on three European farms
CN113989603A (en) Reduced false positive identification for spectral classification
Barla et al. Machine learning methods for predictive proteomics
Dupuy et al. Chemometric analysis of combined NIR and MIR spectra to characterize French olives
CN110657890B (en) Cross-validation based calibration of spectral models
CN101995388A (en) Near infrared quality control analysis method and system of tobacco
CN111563436B (en) Infrared spectrum measuring instrument calibration migration method based on CT-CDD
Chen et al. Cross components calibration transfer of NIR spectroscopy model through PCA and weighted ELM-based TrAdaBoost algorithm
CN108960193B (en) Cross-component infrared spectrum model transplanting method based on transfer learning
Chen et al. A hybrid optimization method for sample partitioning in near-infrared analysis
Shen et al. Rapid and real-time detection of moisture in black tea during withering using micro-near-infrared spectroscopy
CN114626304B (en) Online prediction soft measurement modeling method for ore pulp copper grade
Puttipipatkajorn et al. Development of calibration models for rapid determination of moisture content in rubber sheets using portable near-infrared spectrometers
Li et al. A novel method for the nondestructive classification of different‐age Citri Reticulatae Pericarpium based on data combination technique
CN112651173B (en) Agricultural product quality nondestructive testing method based on cross-domain spectral information and generalizable system
CN105548068B (en) Dynamic Evolution Model bearing calibration and system
CN105223140A (en) The method for quickly identifying of homology material
CN105466885B (en) Based on the near infrared online measuring method without measuring point temperature-compensating mechanism
Zimmer et al. Rapid quantification of constituents in tobacco by NIR fiber‐optic probe
CN114088661A (en) Online prediction method for chemical components in tobacco leaf curing process based on transfer learning and near infrared spectrum
Ricotta et al. Measuring similarity among plots including similarity among species: an extension of traditional approaches
Lemos et al. Self-optimized one-class classification using sum of ranking differences combined with a receiver operator characteristic curve

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230710

Address after: Room A95, No. 66, Honghuagang West Street, Cencun Village, Tianhe District, Guangzhou, Guangdong 510000

Patentee after: Guangdong Zhongtaxun Technology Co.,Ltd.

Address before: 516000 room 806-812, B building, 89 Zhongshan Avenue West, Tianhe District, Guangzhou, Guangdong.

Patentee before: GUANGZHOU SONDON NETWORK TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right