CN117095771B - High-precision spectrum measurement data optimization processing method - Google Patents

High-precision spectrum measurement data optimization processing method Download PDF

Info

Publication number
CN117095771B
CN117095771B CN202311346592.8A CN202311346592A CN117095771B CN 117095771 B CN117095771 B CN 117095771B CN 202311346592 A CN202311346592 A CN 202311346592A CN 117095771 B CN117095771 B CN 117095771B
Authority
CN
China
Prior art keywords
data
frequency
measurement data
spectrum measurement
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311346592.8A
Other languages
Chinese (zh)
Other versions
CN117095771A (en
Inventor
李延磊
周春卿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunshan Shangrui Intelligent Technology Co ltd
Original Assignee
Kunshan Shangrui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunshan Shangrui Intelligent Technology Co ltd filed Critical Kunshan Shangrui Intelligent Technology Co ltd
Priority to CN202311346592.8A priority Critical patent/CN117095771B/en
Publication of CN117095771A publication Critical patent/CN117095771A/en
Application granted granted Critical
Publication of CN117095771B publication Critical patent/CN117095771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/17Systems in which incident light is modified in accordance with the properties of the material investigated
    • G01N21/25Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
    • G01N21/31Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
    • G01N21/35Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
    • G01N21/359Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using near infrared light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Abstract

The invention relates to the technical field of near infrared spectrum analysis, in particular to a high-precision spectrum measurement data optimization processing method. The method comprises the following steps: acquiring spectrum measurement data and constructing an initial isolated tree; further constructing a depth sequence of each data point; determining structural similarity of two data points; determining the similarity consistency degree of the frequency to be measured; dividing frequency intervals of spectrum measurement data according to the similarity consistency degree of different frequencies to obtain characteristic wave bands; the method and the device can effectively improve the detection precision of the spectrum measurement data, realize the optimization processing of the high-precision spectrum measurement data, improve the reliability of the optimized spectrum data and enhance the optimization effect of the spectrum measurement data.

Description

High-precision spectrum measurement data optimization processing method
Technical Field
The invention relates to the technical field of near infrared spectrum analysis, in particular to a high-precision spectrum measurement data optimization processing method.
Background
Near infrared spectrometry is a technique for determining chemical composition of a substance, which includes features of corresponding wavelength, frequency, amplitude, etc., and can detect properties of an object through near infrared spectrometry. This technology has been widely used in many fields such as chemistry, biology, geology, astronomy, etc. However, during spectrum detection, the obtained spectrum measurement data has lower accuracy due to the influence of the environment in the scene, such as the temperature, humidity, vibration, dust, noise, spectrometer performance and other interference in the scene.
In order to improve the accuracy of spectrum measurement data, data optimization is required to be carried out on the spectrum measurement data, and in the related technology, the analysis of abnormal data is realized by comparing sample data with standard data.
Disclosure of Invention
In order to solve the technical problems of insufficient detection precision and reliability and poor optimization effect of spectrum measurement data in the related art, the invention provides a high-precision spectrum measurement data optimization processing method, which adopts the following specific technical scheme:
the invention provides a high-precision spectrum measurement data optimization processing method, which comprises the following steps:
periodically acquiring spectrum measurement data of a sample to be measured at different time points, and determining initial isolated trees of the spectrum measurement data at different dimensions;
constructing a depth sequence of each data point according to the depth information of the data point in different initial isolated trees and the frequency of different depth information in the spectrum measurement data; determining the structural similarity of two data points according to the depth sequence of any two data points, the amplitude difference and the frequency difference of the two data points;
clustering any data point serving as a data point to be detected, the structural similarity of the data point to be detected and all other data points to obtain a cluster of the data points to be detected, taking the frequency of the data points to be detected as the frequency to be detected, taking the cluster containing the frequency to be detected in the cluster corresponding to all the data points as the cluster to be detected, and determining the similarity consistency degree of the frequency to be detected according to the structural similarity values in all the clusters to be detected; dividing frequency intervals of the spectrum measurement data according to the similarity consistency degree of different frequencies to obtain characteristic wave bands;
according to the difference of frequencies contained in characteristic wave bands of different time points, determining an isolated tree splitting frequency, carrying out isolated tree analysis on the spectrum measurement data based on the values of the isolated tree splitting frequency in different dimensions, determining abnormal data points, and carrying out data optimization on the spectrum measurement data according to the abnormal data points to obtain optimized spectrum data.
Further, the constructing a depth sequence of each data point according to the depth information of the data point in different initial isolated trees and the frequency of different depth information in the spectrum measurement data comprises:
taking the depth value of the data point in the initial isolated tree as depth information, and taking the frequency combination of the depth value and the data point under the same depth value as a depth vector;
and sequencing the depth vectors corresponding to all the depth values according to the sequence from the small depth value to the large depth value to obtain a depth sequence of the data points.
Further, the structural similarity of two data points is determined according to the depth sequence of any two data points, the amplitude difference and the frequency difference of the two data points, and the corresponding calculation formula is as follows:
the method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Representing structural similarity of the ith data point and the jth data point,/for>Depth sequence representing the i-th data point, +.>Depth sequence representing jth data point, +.>DTW distance, +_f, representing depth sequence of ith data point and depth sequence of jth data point>Indicating the frequency difference between the ith data point and the jth data point,/for each data point>The difference in amplitude between the ith data point and the jth data point is represented, and x represents a preset constant coefficient.
Further, the determining the similarity consistency degree of the frequencies to be measured according to the values of the structural similarity in all the clusters to be measured includes:
calculating the average value of the structural similarity values of all data points in each cluster to be detected as a cluster average value;
and calculating the sum value of cluster mean values of all the clusters to be tested, and carrying out normalization processing on the sum value to obtain the similarity consistency degree of the frequencies to be tested.
Further, the dividing the frequency interval of the spectrum measurement data according to the similarity consistency degree of different frequencies to obtain a characteristic wave band includes:
and combining frequencies with adjacent and similar consistency degrees larger than a preset consistency threshold value to obtain a characteristic wave band.
Further, the determining the splitting frequency of the orphan tree according to the difference of the frequencies contained in the characteristic wave bands at different time points comprises the following steps:
determining the frequency of any frequency in the characteristic wave band in all time points as the characteristic frequency;
performing inverse proportion normalization processing on the characteristic frequency to obtain an isolated coefficient;
and when the isolation coefficient is larger than a preset isolation threshold value, taking the corresponding frequency as the isolation tree splitting frequency.
Further, the performing an orphan tree analysis on the spectral measurement data based on the orphan tree splitting frequency at values of different dimensions, determining outlier data points, includes:
based on an isolated tree algorithm, characteristic points of different dimensions corresponding to the isolated tree splitting frequency are used as splitting points to be analyzed, and outliers obtained through isolated tree analysis are used as abnormal data points.
Further, the performing data optimization on the spectrum measurement data according to the abnormal data points to obtain optimized spectrum data includes:
abnormal data points are deleted from the spectral measurement data, and the remaining data points are formed into optimized spectral data.
Further, the determining an initial orphan tree of spectral measurement data in different dimensions includes:
based on an isolated tree algorithm, the spectrum measurement data of any time point is randomly selected and analyzed at any dimension to obtain an initial isolated tree of the spectrum measurement data in different dimensions.
Further, the clustering of the structural similarity between the data point to be measured and all other data points to obtain a cluster of the data points to be measured includes:
and clustering the structural similarity of the data points to be detected and all other data points by using a k-means clustering algorithm to obtain a cluster of the data points to be detected.
The invention has the following beneficial effects:
according to the method, the initial isolation tree of the spectrum measurement data in different dimensions is determined by periodically acquiring the spectrum measurement data of the sample to be measured at different time points. And then, constructing a depth sequence according to the depth information and the frequency of the data points in different initial isolation trees, accurately analyzing the distribution of each leaf node in the initial isolation tree through the construction of the depth sequence, and further determining the structural similarity among the data points by combining the depth sequence, the amplitude difference and the frequency difference, so that the structural similarity can effectively represent the similarity degree of the corresponding data points. Clustering is carried out according to the structural similarity, and the similarity consistency degree is calculated; the frequency interval of the spectrum measurement data is divided according to the similarity degree of different frequencies to obtain characteristic wave bands, the similarity degree is used as the division basis of the characteristic wave bands, the spectrum measurement data of all time points can be analyzed, the characteristic wave bands with the most stable characteristics can be screened out according to the change of the spectrum measurement data at different time points, the recognition effect of the characteristic wave bands is ensured, the follow-up analysis of the isolated tree splitting frequency according to the characteristic wave bands is facilitated, abnormal data points are determined, the acquisition of the abnormal data points can integrate the data characteristics of multiple dimensions and multiple time points, the reliability and the accuracy of the acquisition of the abnormal data points are ensured, finally, the spectrum measurement data is subjected to data optimization according to the abnormal data points with higher accuracy and reliability to obtain optimized spectrum data, and the detection accuracy and the reliability of the optimized spectrum data can be improved. In conclusion, the method and the device can effectively improve the detection precision of the spectrum measurement data, realize the optimization processing of the high-precision spectrum measurement data, improve the reliability of the optimized spectrum data and enhance the optimization effect on the spectrum measurement data.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for optimizing high-precision spectral measurement data according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to specific implementation, structure, characteristics and effects of a high-precision spectrum measurement data optimization processing method according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the high-precision spectrum measurement data optimization processing method provided by the invention with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for optimizing high-precision spectrum measurement data according to an embodiment of the present invention is shown, where the method includes:
s101: and periodically acquiring spectrum measurement data of the sample to be measured at different time points, and determining initial isolated trees of the spectrum measurement data at different dimensions.
A specific application scenario of the present invention may be, for example: and obtaining a plurality of high-precision spectrum detection data for the same sample to be detected in a detection period by using a high-precision infrared spectrometer. The sample to be detected may be, for example, a water sample, a food sample, a metal sample, or a plurality of solid samples that can be detected by using a spectrometer.
It should be noted that, in the detection process, the detection environment conditions are kept consistent, so as to avoid the influence of the external environment on the high-precision spectrum measurement data, meanwhile, the detection period can be set to 25 minutes, and the high-precision spectrum detection data at different time points can be obtained at intervals of 30 seconds within 25 minutes as a time point, and of course, the detection period and the sampling frequency can be set according to specific implementation scenes, so that the method is not limited.
The high-precision spectrum detection data is an amplitude sequence of different frequencies at different time points in ascending order, and can be expressed in a spectrum graph form by taking time as a horizontal axis and frequency as a vertical axis, namely the high-precision spectrum detection data is the spectrum graph of drinking. So far, high-precision spectrum measurement data of the same sample to be measured at a plurality of different time points are obtained, subsequent spectrum measurement data change characteristic analysis is facilitated, and abnormal data extraction and high-precision spectrum measurement data optimization processing are realized.
In the embodiment of the invention, a plurality of different dimensions of the spectrum measurement data can be determined, wherein the dimensions are characteristic dimensions of the spectrum measurement data of the sample to be measured, such as amplitude dimensions, frequency dimensions and the like.
Further, in some embodiments of the present invention, determining an initial orphan tree of spectral measurement data in different dimensions includes: based on an isolated tree algorithm, the spectrum measurement data of any time point is randomly selected and analyzed at any dimension to obtain an initial isolated tree of the spectrum measurement data in different dimensions.
According to the embodiment of the invention, the amplitude dimension is used as a specific dimension for analysis, a certain data point is selected as a partition point, the spectrum measurement data is cut according to the partition point, the formed two subsequences are used as two leaf nodes, then the two subsequences are cut again according to the data quantity and the data distribution of the leaf nodes, the leaf node of the next layer is obtained, and the cutting is stopped until only one leaf node of the bottommost layer has data, so that an initial isolated tree is obtained. See the examples that follow for the analysis and selection process.
S102: according to the depth information of the data points in different initial isolated trees and the frequency of different depth information in the spectrum measurement data, constructing a depth sequence of each data point; and determining the structural similarity of the two data points according to the depth sequence of any two data points, the amplitude difference and the frequency difference of the two data points.
In the embodiment of the invention, each dimension can correspond to an initial isolation tree, and because the spectrum measurement data comprises a plurality of dimensions and is selected only according to the abnormal data of one dimension, the reliability is lower, so that the invention combines all the dimensions to carry out overall analysis.
It will be appreciated that since different samples to be tested have a decisive effect on the distribution characteristics of each band of their spectra, the similarity characteristics of the data points and adjacent data points over each spectral band are related to their band positions. Therefore, the scheme constructs an isolated tree for the data points according to an isolated tree algorithm, and analyzes the structural similarity of any two data points.
Further, in some embodiments of the present invention, constructing a depth sequence for each data point based on the depth information of the data point in the spectral measurement data in different initial orphaned trees and the frequency of the different depth information, comprises: taking the depth value of the data point in the initial isolated tree as depth information, and taking the frequency combination of the depth value and the data point under the same depth value as a depth vector; and sequencing the depth vectors corresponding to all the depth values according to the sequence from the small depth value to the large depth value to obtain a depth sequence of the data points.
It can be understood that, in the embodiment of the present invention, the number of layers where the leaf node is located may be taken as the corresponding depth value, and the closer the leaf node is to the root node, the smaller the value of the number of layers where the leaf node is located is, the larger the corresponding depth value is, the farther the leaf node where the data point is located is from the root node, and the depth value is taken as the depth information in the embodiment of the present invention, meanwhile, when only one data point is included in the leaf node when the isolated tree analysis is performed, the corresponding data point may be indicated to be completely divided, so that the deeper leaf node will not include the data point, that is, the greater the depth information is, the more normal the data point is in the corresponding dimension, and in order to analyze the similarity of the data point, the frequency of occurrence of the data point in all dimensions may be determined, and the combination of the depth value and the frequency of the data point in the same depth value is taken as the depth vector of the data point.
For example, taking the data point p as a specific example, the included dimensions include three dimensions of frequency, amplitude and amplitude change rate, the depth value of the point p in the frequency dimension is 3, in the amplitude dimension, the leaf node with the depth value of 3 includes the data point p, in the amplitude change rate dimension, the leaf node with the depth value of 3 does not include the data point p, that is, the frequency is 2, the corresponding depth vector is (3, 2), the data point p is analyzed under other depth values, and then the depth vectors are sequenced according to the sequence from small depth values to large to obtain the corresponding depth sequence.
The structural similarity can represent the similarity degree of the structural distribution of two data points in all the isolated trees, and the similarity of the depth sequences of the data points in a certain frequency interval is of practical significance only because the data points in the certain frequency interval correspond to the same substance due to the wave band physical characteristics on the spectrum data. Thus, the more similar the depth sequence between two data points, the more similar the frequency and the more similar the amplitude, the greater the structural similarity of the two data points.
Further, in some embodiments of the present invention, the structural similarity of two data points is determined according to the depth sequence of any two data points, the amplitude difference and the frequency difference of the two data points, and the corresponding calculation formula is:
in the method, in the process of the invention,representing structural similarity of the ith data point and the jth data point,/for>Depth sequence representing the i-th data point, +.>Depth sequence representing jth data point, +.>DTW distance, +_f, representing depth sequence of ith data point and depth sequence of jth data point>Indicating the frequency difference between the ith data point and the jth data point,/for each data point>X represents the difference in amplitude between the ith data point and the jth data pointA preset constant coefficient is shown, which is a safety value set to prevent the denominator from being 0, alternatively, 0.01.
It will be appreciated that the depth sequence may be used as the overall distribution information of the corresponding data points in all dimensions, so in the embodiment of the present invention, the DTW distance of the depth sequence of any two data points is calculated, where the DTW distance is the distance between two sequences calculated based on a dynamic time warping (Dynamic Time Warping, DTW) algorithm, and when the DTW distance is smaller, the similarity of the corresponding two data points is higherAnd (3) withIn a negative correlation, the smaller the frequency difference and the amplitude difference, the higher the similarity of two corresponding data points can be expressed as +.>、/>All are in charge of>And in negative correlation, calculating the product of the DTW distance, the frequency difference and the amplitude difference, and carrying out negative correlation on the product to obtain the structural similarity of two data points.
S103: clustering any data point serving as a data point to be detected, and carrying out structural similarity between the data point to be detected and all other data points to obtain a cluster of the data points to be detected, wherein the frequency of the data points to be detected is used as the frequency to be detected, the cluster containing the frequency to be detected in the cluster corresponding to all the data points is used as the cluster to be detected, and the similarity consistency degree of the frequency to be detected is determined according to the structural similarity value in all the clusters to be detected; and dividing the frequency interval of the spectrum measurement data according to the similarity consistency degree of different frequencies to obtain a characteristic wave band.
Further, in some embodiments of the present invention, clustering the structural similarity between the data point to be measured and all other data points to obtain a cluster of data points to be measured includes: and clustering the structural similarity of the data points to be detected and all other data points by using a k-means clustering algorithm to obtain a cluster of the data points to be detected.
In the embodiment of the invention, the preset k value can be used as the mass center number of the clusters, wherein the preset k value can be set according to actual detection experience, or can be calculated based on an elbow method and the like, and of course, it is understood that the k-means clustering algorithm is a distance-based clustering algorithm, so that the structural similarity of the data points to be detected and all other data points is clustered by using the k-means clustering algorithm, and the obtained cluster is a cluster with relatively similar distribution in space.
It can be understood that the obtained clusters are a plurality of frequency intervals with higher similarity determined by the view angles of the data points to be measured, the similarity consistency degree of any frequency is analyzed according to a plurality of clustering results with different data points to be measured as view angles, namely, the frequency of the data points to be measured is used as the frequency to be measured, and the clusters corresponding to all the data points and containing the frequency to be measured are used as the clusters to be measured.
Further, in some embodiments of the present invention, determining the degree of similarity consistency of the frequencies under test according to the values of the structural similarity in all clusters under test includes: calculating the average value of the structural similarity values of all data points in each cluster to be detected as a cluster average value; and calculating the sum value of cluster mean values of all the clusters to be tested, and carrying out normalization processing on the sum value to obtain the similarity consistency degree of the frequencies to be tested.
In the embodiment of the invention, each cluster to be detected can be analyzed, that is, the average value of the structural similarity values of all data points in the cluster to be detected is calculated as the cluster average value, and it can be understood that the cluster to be detected is a set which uses different data points as the data points to be detected and contains fixed frequency, that is, the cluster to be detected is analyzed, that is, the whole analysis is performed on the light measurement data, and the obtained similarity consistency degree has better expression effect.
Further, in some embodiments of the present invention, the frequency interval of the spectrum measurement data is divided according to the degree of similarity and consistency of different frequencies, so as to obtain a characteristic band, which includes: and combining frequencies with adjacent and similar consistency degrees larger than a preset consistency threshold value to obtain a characteristic wave band.
In the embodiment of the invention, adjacent frequencies with the similar consistency degree larger than the preset consistency threshold value can be combined, and all frequencies are traversed to obtain the characteristic wave band with the larger similarity degree.
The preset consistency threshold is a threshold of similarity consistency degree, and in the embodiment of the present invention, the preset consistency threshold may be set to 0.89, which is not limited.
S104: according to the difference of frequencies contained in characteristic wave bands of different time points, determining the splitting frequency of an isolated tree, carrying out isolated tree analysis on spectrum measurement data based on the values of the splitting frequency of the isolated tree in different dimensions, determining abnormal data points, and carrying out data optimization on the spectrum measurement data according to the abnormal data points to obtain optimized spectrum data.
In the embodiment of the invention, the analysis can be performed according to the obtained similarity consistency degree of each frequency and all characteristic wave bands, so that abnormal data points are obtained, and because the characteristic wave bands are frequency wave bands on spectrum measurement data obtained at one time point and have certain spectrum data change along with the change of time, the stability degree of the characteristic wave bands is analyzed by combining the wave band fluctuation characteristics of multiple time points, so that the preference degree of the data of each frequency in the analysis processing of corresponding isolation trees is determined.
Further, in some embodiments of the present invention, determining the orphan tree splitting frequency from the difference in frequencies contained in the characteristic bands at different points in time includes: determining the frequency of any frequency in the characteristic wave band in all time points as the characteristic frequency; performing inverse proportion normalization processing on the characteristic frequency to obtain an isolated coefficient; and when the isolation coefficient is larger than a preset isolation threshold value, taking the corresponding frequency as the isolation tree splitting frequency.
In the embodiment of the invention, each frequency is specifically analyzed, that is, the frequency of the frequency appearing in the characteristic wave band corresponding to all time points is taken as the characteristic frequency, and the larger the characteristic frequency is, the more popular the corresponding frequency in all data points is, that is, the more normal the corresponding frequency is, so that the worse the splitting effect is caused by taking the frequency as the splitting point of the splitting of the isolated tree, and further the calculation redundancy is caused when the splitting of the isolated tree is carried out.
In the embodiment of the present invention, the preset isolation threshold may specifically be, for example, 0.85, or may be adjusted according to an actual detection requirement, which is not further limited and described in detail. And taking the frequency which is larger than a preset isolation threshold value as an isolation tree splitting frequency, and then constructing and analyzing the isolation tree based on the isolation tree splitting frequency.
Further, in some embodiments of the present invention, performing an orphan tree analysis on the spectral measurement data based on values of orphan tree splitting frequency in different dimensions, determining outlier data points includes: based on an isolated tree algorithm, characteristic points of different dimensions corresponding to the isolated tree splitting frequency are used as splitting points to be analyzed, and outliers obtained through isolated tree analysis are used as abnormal data points.
In the embodiment of the invention, the feature points of different dimensions corresponding to the splitting frequency of the isolated tree can be used as the splitting points based on the isolated tree algorithm, the isolated tree is constructed, the outliers are directly obtained according to the distribution characteristics of the isolated tree, and the outliers are used as the abnormal data points.
Under the condition, the problems that the structure of the whole isolated tree is complex and the calculation is complicated due to the fact that the isolated tree segmentation points are selected to normal data points are avoided.
Further, in some embodiments of the present invention, data optimization is performed on the spectral measurement data according to the abnormal data points to obtain optimized spectral data, including: abnormal data points are deleted from the spectral measurement data, and the remaining data points are formed into optimized spectral data.
In the embodiment of the invention, after the abnormal data point is obtained by detection, the corresponding abnormal data point can be deleted from the spectrum measurement data, or the abnormal data point can be smoothed according to the characteristics of other data points in the local range where the abnormal data point is positioned, so that the influence of the abnormal data point on the whole spectrum measurement data is eliminated, and the optimized spectrum data with better quality is obtained.
According to the method, the initial isolation tree of the spectrum measurement data in different dimensions is determined by periodically acquiring the spectrum measurement data of the sample to be measured at different time points. And then, constructing a depth sequence according to the depth information and the frequency of the data points in different initial isolation trees, accurately analyzing the distribution of each leaf node in the initial isolation tree through the construction of the depth sequence, and further determining the structural similarity among the data points by combining the depth sequence, the amplitude difference and the frequency difference, so that the structural similarity can effectively represent the similarity degree of the corresponding data points. Clustering is carried out according to the structural similarity, and the similarity consistency degree is calculated; the frequency interval of the spectrum measurement data is divided according to the similarity degree of different frequencies to obtain characteristic wave bands, the similarity degree is used as the division basis of the characteristic wave bands, the spectrum measurement data of all time points can be analyzed, the characteristic wave bands with the most stable characteristics can be screened out according to the change of the spectrum measurement data at different time points, the recognition effect of the characteristic wave bands is ensured, the follow-up analysis of the isolated tree splitting frequency according to the characteristic wave bands is facilitated, abnormal data points are determined, the acquisition of the abnormal data points can integrate the data characteristics of multiple dimensions and multiple time points, the reliability and the accuracy of the acquisition of the abnormal data points are ensured, finally, the spectrum measurement data is subjected to data optimization according to the abnormal data points with higher accuracy and reliability to obtain optimized spectrum data, and the detection accuracy and the reliability of the optimized spectrum data can be improved. In conclusion, the method and the device can effectively improve the detection precision of the spectrum measurement data, realize the optimization processing of the high-precision spectrum measurement data, improve the reliability of the optimized spectrum data and enhance the optimization effect on the spectrum measurement data.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (8)

1. A method for optimizing high-precision spectral measurement data, the method comprising:
periodically acquiring spectrum measurement data of a sample to be measured at different time points, and determining initial isolated trees of the spectrum measurement data at different dimensions;
constructing a depth sequence of each data point according to the depth information of the data point in different initial isolated trees and the frequency of different depth information in the spectrum measurement data; determining the structural similarity of two data points according to the depth sequence of any two data points, the amplitude difference and the frequency difference of the two data points;
clustering any data point serving as a data point to be detected, the structural similarity of the data point to be detected and all other data points to obtain a cluster of the data points to be detected, taking the frequency of the data points to be detected as the frequency to be detected, taking the cluster containing the frequency to be detected in the cluster corresponding to all the data points as the cluster to be detected, and determining the similarity consistency degree of the frequency to be detected according to the structural similarity values in all the clusters to be detected; dividing frequency intervals of the spectrum measurement data according to the similarity consistency degree of different frequencies to obtain characteristic wave bands;
determining an isolated tree splitting frequency according to the difference of frequencies contained in characteristic wave bands of different time points, performing isolated tree analysis on the spectrum measurement data based on the values of the isolated tree splitting frequency in different dimensions, determining abnormal data points, and performing data optimization on the spectrum measurement data according to the abnormal data points to obtain optimized spectrum data;
the data optimization is performed on the spectrum measurement data according to the abnormal data points to obtain optimized spectrum data, and the method comprises the following steps:
deleting abnormal data points from the spectrum measurement data, and forming the rest data points into optimized spectrum data;
wherein the dimension is a characteristic dimension of the spectrum measurement data, and the determining the initial isolated tree of the spectrum measurement data in different dimensions comprises:
based on an isolated tree algorithm, the spectrum measurement data of any time point is randomly selected and analyzed at any dimension to obtain an initial isolated tree of the spectrum measurement data in different dimensions.
2. The method for optimizing high-precision spectral measurement data according to claim 1, wherein said constructing a depth sequence for each data point based on depth information of the data point in different initial isolated trees and frequency of different depth information in the spectral measurement data comprises:
taking the depth value of the data point in the initial isolated tree as depth information, and taking the frequency combination of the depth value and the data point under the same depth value as a depth vector;
and sequencing the depth vectors corresponding to all the depth values according to the sequence from the small depth value to the large depth value to obtain a depth sequence of the data points.
3. The method for optimizing high-precision spectrum measurement data according to claim 1, wherein the structural similarity of two data points is determined according to a depth sequence of any two data points, an amplitude difference and a frequency difference of the two data points, and the corresponding calculation formula is as follows:
the method comprises the steps of carrying out a first treatment on the surface of the In (1) the->Representing structural similarity of the ith data point and the jth data point,/for>Depth sequence representing the i-th data point, +.>Depth sequence representing jth data point, +.>DTW distance, +_f, representing depth sequence of ith data point and depth sequence of jth data point>Indicating the frequency difference between the ith data point and the jth data point,/for each data point>The difference in amplitude between the ith data point and the jth data point is represented, and x represents a preset constant coefficient.
4. The method for optimizing high-precision spectrum measurement data according to claim 1, wherein said determining the degree of similarity consistency of the frequencies to be measured according to the values of the structural similarity in all the clusters to be measured comprises:
calculating the average value of the structural similarity values of all data points in each cluster to be detected as a cluster average value;
and calculating the sum value of cluster mean values of all the clusters to be tested, and carrying out normalization processing on the sum value to obtain the similarity consistency degree of the frequencies to be tested.
5. The method for optimizing high-precision spectrum measurement data according to claim 1, wherein the dividing the frequency interval of the spectrum measurement data according to the similarity consistency degree of different frequencies to obtain the characteristic wave band comprises:
and combining frequencies with adjacent and similar consistency degrees larger than a preset consistency threshold value to obtain a characteristic wave band.
6. The method for optimizing high-precision spectral measurement data according to claim 1, wherein said determining the isolated tree splitting frequency based on the difference of frequencies included in the characteristic bands at different time points comprises:
determining the frequency of any frequency in the characteristic wave band in all time points as the characteristic frequency;
performing inverse proportion normalization processing on the characteristic frequency to obtain an isolated coefficient;
and when the isolation coefficient is larger than a preset isolation threshold value, taking the corresponding frequency as the isolation tree splitting frequency.
7. The method of optimizing high-precision spectral measurement data according to claim 1, wherein said performing an orphan tree analysis on said spectral measurement data based on values of said orphan tree splitting frequency in different dimensions, determining outlier data points, comprises:
based on an isolated tree algorithm, characteristic points of different dimensions corresponding to the isolated tree splitting frequency are used as splitting points to be analyzed, and outliers obtained through isolated tree analysis are used as abnormal data points.
8. The method for optimizing high-precision spectrum measurement data according to claim 1, wherein the clustering of the structural similarity between the data point to be measured and all other data points to obtain a cluster of data points to be measured comprises:
and clustering the structural similarity of the data points to be detected and all other data points by using a k-means clustering algorithm to obtain a cluster of the data points to be detected.
CN202311346592.8A 2023-10-18 2023-10-18 High-precision spectrum measurement data optimization processing method Active CN117095771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311346592.8A CN117095771B (en) 2023-10-18 2023-10-18 High-precision spectrum measurement data optimization processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311346592.8A CN117095771B (en) 2023-10-18 2023-10-18 High-precision spectrum measurement data optimization processing method

Publications (2)

Publication Number Publication Date
CN117095771A CN117095771A (en) 2023-11-21
CN117095771B true CN117095771B (en) 2024-02-06

Family

ID=88775421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311346592.8A Active CN117095771B (en) 2023-10-18 2023-10-18 High-precision spectrum measurement data optimization processing method

Country Status (1)

Country Link
CN (1) CN117095771B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436006B (en) * 2023-12-22 2024-03-15 圣道天德电气(山东)有限公司 Intelligent ring main unit fault real-time monitoring method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871886A (en) * 2019-01-28 2019-06-11 平安科技(深圳)有限公司 Abnormal point ratio optimization method, apparatus and computer equipment based on spectral clustering
CN111322957A (en) * 2020-04-22 2020-06-23 昆山尚瑞智能科技有限公司 Measuring mechanism for measuring inner hole diameter by using color confocal method
CN116168036A (en) * 2023-04-26 2023-05-26 深圳市岑科实业有限公司 Abnormal intelligent monitoring system for inductance winding equipment
CN116503632A (en) * 2023-06-25 2023-07-28 广东工业大学 Subspace-based multi-subclass mean hyperspectral image clustering method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016141198A1 (en) * 2015-03-05 2016-09-09 Bio-Rad Laboratories, Inc. Optimized spectral matching and display

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871886A (en) * 2019-01-28 2019-06-11 平安科技(深圳)有限公司 Abnormal point ratio optimization method, apparatus and computer equipment based on spectral clustering
CN111322957A (en) * 2020-04-22 2020-06-23 昆山尚瑞智能科技有限公司 Measuring mechanism for measuring inner hole diameter by using color confocal method
CN116168036A (en) * 2023-04-26 2023-05-26 深圳市岑科实业有限公司 Abnormal intelligent monitoring system for inductance winding equipment
CN116503632A (en) * 2023-06-25 2023-07-28 广东工业大学 Subspace-based multi-subclass mean hyperspectral image clustering method

Also Published As

Publication number Publication date
CN117095771A (en) 2023-11-21

Similar Documents

Publication Publication Date Title
McFee et al. Analyzing Song Structure with Spectral Clustering.
CN117095771B (en) High-precision spectrum measurement data optimization processing method
CN112147573A (en) Passive positioning method based on amplitude and phase information of CSI (channel State information)
CN110531054B (en) Soil organic carbon prediction uncertainty estimation method based on Bootstrap sampling
CN108181263B (en) Tobacco leaf position feature extraction and discrimination method based on near infrared spectrum
US11681778B2 (en) Analysis data processing method and analysis data processing device
CN105486655B (en) The soil organism rapid detection method of model is intelligently identified based on infrared spectroscopy
CN102072767A (en) Wavelength similarity consensus regression-based infrared spectrum quantitative analysis method and device
JP6004080B2 (en) Data processing apparatus and data processing method
CN109324015B (en) Tobacco leaf replacing method based on spectrum similarity
CN109738413B (en) Mixture Raman spectrum qualitative analysis method based on sparse nonnegative least square
CN117132778B (en) Spectrum measurement correction calculation method and system
CN104215591A (en) Damage-free visible-near infrared light spectrum detecting method
CN106018331B (en) The method for estimating stability and pretreatment optimization method of multi-channel spectral system
CN113008805A (en) Radix angelicae decoction piece quality prediction method based on hyperspectral imaging depth analysis
CN111209960A (en) CSI system multipath classification method based on improved random forest algorithm
CN107202559B (en) Object identification method based on indoor acoustic channel disturbance analysis
CN102135496A (en) Infrared spectrum quantitative analysis method and infrared spectrum quantitative analysis device based on multi-scale regression
CN112485217A (en) Method and device for constructing meat identification model applied to origin tracing
CN117112979A (en) Error compensation optimization method in spectrum measurement process
CN115824996A (en) Tobacco conventional chemical component general model modeling method based on near infrared spectrum
Wang et al. Feature selection of gas chromatography/mass spectrometry chemical profiles of basil plants using a bootstrapped fuzzy rule-building expert system
WO2023123329A1 (en) Method and system for extracting net signal in near-infrared spectrum
Charbuillet et al. Filter bank design for speaker diarization based on genetic algorithms
CN111083632A (en) Ultra-wideband indoor positioning method based on support vector machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant