CN117612644A - Air safety evaluation method and system - Google Patents

Air safety evaluation method and system Download PDF

Info

Publication number
CN117612644A
CN117612644A CN202410089005.XA CN202410089005A CN117612644A CN 117612644 A CN117612644 A CN 117612644A CN 202410089005 A CN202410089005 A CN 202410089005A CN 117612644 A CN117612644 A CN 117612644A
Authority
CN
China
Prior art keywords
data
dimension
difference
sample data
dimensions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410089005.XA
Other languages
Chinese (zh)
Other versions
CN117612644B (en
Inventor
刘豫湘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lingze Tech Co ltd
Original Assignee
Shanghai Lingze Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lingze Tech Co ltd filed Critical Shanghai Lingze Tech Co ltd
Priority to CN202410089005.XA priority Critical patent/CN117612644B/en
Publication of CN117612644A publication Critical patent/CN117612644A/en
Application granted granted Critical
Publication of CN117612644B publication Critical patent/CN117612644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0001Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00 by organoleptic means
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0027General constructional details of gas analysers, e.g. portable test equipment concerning the detector
    • G01N33/0031General constructional details of gas analysers, e.g. portable test equipment concerning the detector comprising two or more sensors, e.g. a sensor array
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0062General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/0004Gaseous mixtures, e.g. polluted air
    • G01N33/0009General constructional details of gas analysers, e.g. portable test equipment
    • G01N33/0062General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display
    • G01N33/0068General constructional details of gas analysers, e.g. portable test equipment concerning the measuring method or the display, e.g. intermittent measurement or digital display using a computer specifically programmed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Food Science & Technology (AREA)
  • Combustion & Propulsion (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analyzing Materials By The Use Of Fluid Adsorption Or Reactions (AREA)

Abstract

The invention relates to the technical field of odor detection, in particular to an air safety evaluation method and system, which specifically comprise the following steps: acquiring sample data of monitoring gas by each sensor on the electronic nose; for each dimension data in each response curve sampling vector of each sensor, acquiring the number of interval root nodes between any two dimensions and similar layers according to the data difference distribution condition of the surrounding dimension data between different concentrations of different harmful volatile chemical substances; according to the number of interval root nodes, the similar level and the dimensional difference, the PCA algorithm is improved, and each sample data after the dimension reduction is obtained; obtaining enhanced sample data according to the data change difference between the sample data before and after dimension reduction; and taking the enhanced sample data as a training data set, training by adopting a multi-layer perceptron classifier, and outputting the corresponding types of volatile chemical substances. The invention aims to improve the accuracy of air safety evaluation.

Description

Air safety evaluation method and system
Technical Field
The application relates to the technical field of odor detection, in particular to an air safety evaluation method and system.
Background
When the quality evaluation is carried out on the air safety, the characteristic electric signals are usually obtained through the sensor arrays formed by different materials in the electronic nose, the analysis of chemical components in the air is finished by establishing a mathematical model through the mapping relation between the characteristic electric signals and historical experimental data, and the odor identification device replaces an olfactory reader, so that whether the air contains odor of a certain type of harmful chemical volatile substances can be sensitively identified, and the air safety quality detection is finished.
When the electronic nose sensor is used, a large amount of data can be generated for each gas sample, but the traditional processing method usually adopts a PCA algorithm to directly reduce the dimension of the large amount of data, and does not consider the structural information of the large amount of data generated by the electronic nose, so that effective characteristic information of some characterization peculiar smell in the data is covered, and the final detection effect is inaccurate.
Disclosure of Invention
In order to solve the technical problems, the invention provides an air safety evaluation method and an air safety evaluation system, and the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides an air security evaluation method, including the steps of:
acquiring sample data of each harmful volatile chemical substance monitored by each sensor on the electronic nose under each concentration sample;
for each dimension data in the sample data, obtaining the local variation of the dimension data according to the data difference between the dimension data and the surrounding dimension data; obtaining the dimensional difference between any two dimensions according to the local variation difference between different concentrations of different harmful volatile chemical substances in any two dimensions; acquiring the number of interval root nodes between any two dimensions and similar layers based on the distribution condition of dimensional differences between the dimensions;
constructing dimension difference degree between any two dimensions according to the number of the interval root nodes, the similar layers and the dimension difference; the PCA algorithm is improved according to the dimension difference degree between any two dimensions, and each sample data after dimension reduction is obtained; constructing a data enhancement scale of the sample data according to the data change difference between the sample data before and after dimension reduction; obtaining enhanced sample data according to the dimension-reduced sample data and the enhancement scale;
taking the enhanced sample data as a training data set, training by adopting a multi-layer perceptron classifier, and outputting the types of the corresponding volatile chemical substances;
the dimension data in the sample data specifically comprises: each sensor collects each data point in a response curve sampling vector formed by detection data of sample volatilization.
Preferably, the obtaining the local variation of the dimension data according to the data difference between the dimension data and the surrounding dimension data includes:
setting a window containing data with the same quantity adjacent left and right, respectively calculating the absolute value of the data difference value of each data point in the window and the dimension data, and taking the average value of the absolute value of the data difference value of all the data points in the window as the local variation of the dimension data.
Preferably, the step of obtaining the dimensional difference between any two dimensions according to the local variation difference between different concentrations of different harmful volatile chemical substances in any two dimensions includes:
for any two dimensions, calculating the absolute value of the difference value of the local variation of each concentration sample of each harmful volatile chemical substance and the distance of the data point on the time sequence; acquiring Euclidean distance between response curve sampling vectors corresponding to any two dimensions;
and calculating the product of the absolute value of the difference and the Euclidean distance, calculating the ratio of the product to the distance, and taking the sum of the ratios under all concentration samples of all harmful volatile chemical substances as the dimension difference between any two dimensions.
Preferably, the acquiring the number of the interval root nodes between any two dimensions and the similar hierarchy based on the distribution condition of the dimensional differences between the dimensions includes:
clustering the dimensional differences among all the dimensions by adopting a hierarchical clustering algorithm, and outputting a tree structure;
the tree structure is input into a Dijkstra algorithm, the shortest path between any two leaf nodes is output, the number of root nodes passing through in the shortest path between any two nodes is used as the number of interval root nodes between any two nodes, and the level of the node at the highest level on the shortest path is used as the similar level between any two nodes.
Preferably, the constructing the dimension difference degree between any two dimensions according to the number of the interval root nodes, the similar layers and the dimension difference includes:
and calculating the sum value of the number of the interval root nodes and the similar hierarchy, and taking the product of the sum value and the dimensional difference as the dimensional difference degree between any two dimensions.
Preferably, the step of obtaining the reduced sample data according to the PCA algorithm modified according to the dimension difference between any two dimensions includes:
obtaining the maximum dimension difference degree, the dimension difference degree and the value in the dimension difference degree between any two dimensions;
for the dimension difference degree between any two dimensions, calculating the difference value between the maximum dimension difference degree and the dimension difference degree, calculating the ratio of the difference value to the sum of the dimension difference degrees, and taking the product of the difference value of the ratio and 0.5 and the preset adjustment strength as the adjustment coefficient between any two dimensions;
and (3) improving the multiple of each corresponding dimension data of the covariance matrix in the PCA algorithm process based on the adjustment coefficient to obtain each sample data after dimension reduction.
Preferably, the constructing the data enhancement scale of the sample data according to the data variation difference between the sample data before and after the dimension reduction includes:
for each sample data before and after dimension reduction, calculating normalized values of Euclidean distance average values of k sample data with the nearest Euclidean distance to the sample data as local probability of the sample data; acquiring the information quantity of the local probability of each sample data before and after dimension reduction;
calculating the absolute value of a difference between corresponding information amounts before and after dimension reduction of sample data, and calculating the product of the absolute value of the difference and the local probability after dimension reduction;
and calculating the sum value of the products of all the sample data, and multiplying the ratio result of the products to the sum value by preset enhancement weights to obtain the data enhancement scale of the sample data.
Preferably, the method for obtaining the sample data after the dimension reduction and the enhancement scale enhancement according to the sample data after the dimension reduction includes:
taking the dimension-reduced sample data and the enhancement scale as the input of a mean value interpolation algorithm, and outputting a distribution function; acquiring the center coordinates of the sample data after each dimension reduction;
for each sample data, calculating a direction vector from the sample data to the corresponding center coordinates, calculating a product of the direction vector and the corresponding distribution function value, and taking the sum of the product and the sample data as the enhanced sample data.
In a second aspect, an embodiment of the present invention further provides an air security evaluation system, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.
The invention has at least the following beneficial effects:
according to the method, the dimension data with similar local variation in response curves or similar response curves are classified according to the structural information of the electronic nose data, the correlation measurement index of the dimension data of the electronic nose is calculated, and a hierarchical clustering algorithm is combined to obtain covariance matrix adjustment coefficients; the method has the advantages that the dimension reduction process of the PCA algorithm is improved by utilizing the structural information of the electronic nose data with different dimensions, so that the dimension reduction result has a better dimension reduction effect, and more effective characteristic information can be reserved;
further, constructing a data enhancement scale for information quantity change conditions of different positions in a data space and local densities of different positions in the data space after the dimension reduction in sample data before and after the dimension reduction, and representing information quantity reduction conditions of positions of each sample data after the dimension reduction, so as to adjust the sample data after the dimension reduction, enable the data after the dimension reduction to contain more information quantity, and obtain a better dimension reduction effect;
finally, compared with the traditional PCA algorithm, the invention reserves more peculiar smell characteristic information when processing the electronic nose data by improving the PCA algorithm, so that the air safety is evaluated more accurately finally.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an air safety evaluation method provided by the invention;
FIG. 2 is a schematic diagram of a multi-layer sensor classifier.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention to achieve the preset purposes, the following detailed description refers to specific embodiments, structures, features and effects of an air safety evaluation method and system according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the air safety evaluation method and system provided by the invention with reference to the accompanying drawings.
The embodiment of the invention provides an air safety evaluation method and an air safety evaluation system.
Specifically, referring to fig. 1, the following method for evaluating air safety is provided, and the method includes the following steps:
in step S001, sample data under different gas samples are collected.
For harmful volatile chemical substances in common air, gas samples containing different concentrations of the harmful volatile chemical substances are prepared. The embodiment selectsHarmful volatile chemical species are used as detection targets, assuming that +.>The minimum harmful concentration of the species harmful volatile chemical is +.>Molar per liter, set->A sample having a concentration gradient, wherein the gas concentration of the highest concentration sample is +.>The gas concentration of the lowest concentration sample is +.>The remaining samples are uniformly selected between the highest concentration sample and the lowest concentration sample. Final->The +.sup.th of harmful volatile chemical species>The individual sample concentrations were noted as. Wherein, the embodiment takes the tested value +.>、/>Amount of harmful volatile chemical substances>Sample number of different concentrations->The implementation can be set by the user according to the actual situation.
Further acquiring data of the gas sample by using the electronic nose, wherein the first step isHarmful volatile chemical substances->The detection data of the volatilization of the samples of the individual concentrations are recorded as response curve group +.>The method comprises the steps of carrying out a first treatment on the surface of the Assume that +.>Seed sensor, number of sensors this embodiment takes the checked value +.>Then each response curve group +.>There is->And a response curve.
Further, for the firstResponse curves of seed sensors are each +.>Sampling once in milliseconds to obtain a response curve sampling vector recorded as +.>,/>Indicate->Harmful volatile chemical substances->Response curve sampling vector of first sensor of individual concentration samples, +>Representing the t-th sample point in the response curve sample vector. Thus there is +.>And dimension data. Wherein, the embodiment takes the checked value sampling period +.>Number of sampling points +.>The implementation can be set by the user according to the actual situation.
Finally, for the firstHarmful volatile chemical substances->Samples of individual concentrations, obtaining a sample data, recorded asThere is +.>And dimension data.
In order to further obtain a complete fitting effect of the constructed multi-layer perceptron classifier, a clean air sample is acquiredA test sample, wherein->Sample data of a single clean air sample are recorded +.>=/>
And step S002, processing sample data through characteristic operation, and carrying out dimension reduction, classification and enhancement on data with different dimensions.
Since the electronic nose has extremely large amount of dimensional data, as in the present embodiment, each sample dataAll haveAnd the dimensions are the dimensions, so that the obtained electronic nose data needs to be subjected to dimension reduction processing. The traditional dimension reduction method mostly adopts PCA algorithm to reduce dimension, but can lead to part of data characteristics to be covered, so that the final electronic nose recognition effect is poor.
When the PCA algorithm is used for reducing the dimension of the data in the electronic nose, the dimension data with larger similarity is expected to have larger numerical difference after the dimension reduction, so that more effective information can be reserved in the dimension-reduced data. In the electronic nose data, the similarity of the data can be represented by calculating the local change characteristics of the data in a certain dimension. Therefore, the local change characteristic of the data in a certain dimension is extracted, and the empirical value of the window size is taken asThe local variation characteristics of data in a certain dimension are calculated as follows:
in the method, in the process of the invention,is->The response curve of the seed sensor sample vector +.>Local variation of dimension data for each location; />Is->The response curve of the seed sensor sample vector +.>Data of the individual locations; />Is->The response curve of the seed sensor sample vector +.>Data of the individual locations; />Is a window scale parameter set when calculating local variation features. Wherein, the embodiment takes the checked value window scale parameter +.>
The local change characteristic of certain dimension data can be obtained by the above formula, when the local data of the dimension data is changed greatly, the numerical value difference of the dimension data of the adjacent positions is larger, and finally, the method is thatThe value is larger. For dimension data with similar local variation characteristics, it +.>Are close in value.
The dimensional differences between the dimensions are further calculated as follows:
in the method, in the process of the invention,is->The response curve of the seed sensor sample vector +.>Dimension data of individual positions and +.>The response curve of the seed sensor sample vector +.>Dimensional differences between dimensions of the individual locations; />Indicating the amount of harmful volatile chemicals; />The number of samples of different concentrations is shown; />、/>Respectively indicate->Harmful volatile chemical substancesFirst->In the sample data of the individual sample concentrations +.>The response curve of the seed sensor sample vector +.>Local variation of the individual positions, and->The response curve of the seed sensor sample vector +.>Local variation of individual positions; />Is->Harmful volatile chemical substances->In the sample data of the individual sample concentrations +.>Response curve sample vector of seed sensor and +.>The response curve of a sensor samples the Euclidean distance of the vector.
The difference between different dimensions can be calculated by the above equation. When the two dimensions differ in time on the response curveThe larger, and the local variation difference of the two dimensions +.>The smaller the difference between the two dimensions is, the smaller; likewise, theThe more similar the response curves are when the two dimensions are located, the +.>The smaller the difference between the two dimensions, the smaller. Finally get->Characterizing the difference between the two dimensions, the smaller the value the smaller the difference between the two dimensions.
Further, toThe individual dimensions are input in terms of the difference between the dimensions +.>Clustering the measurement indexes by adopting a hierarchical clustering algorithm, and outputting a tree structure; there is +.>And each leaf node corresponds to one dimension. The hierarchical clustering algorithm is a well-known technique, and this embodiment is not described in detail.
In the obtained tree structure, the length of line segments connected between nodes is recorded asTakes the tree structure as input, for the +.>Individual leaf nodes and->And each leaf node adopts Dijkstra algorithm to output the shortest path between any two leaf nodes.
Obtaining the number of root nodes in the shortest path as the number of interval root nodesThe level of the node with the highest level on the shortest path is the same class level +.>. For the dimension corresponding to the two leaf nodes, the smaller the number of the interval root nodes or the like hierarchical values in the tree structure is, the closer the two leaf nodes are in the hierarchical clustering result, and the smaller the difference is.
The above steps calculate the variability between the two dimensions, and further require the improvement of the PCA algorithm based on the obtained eigenvalues characterizing the variability of the data in the two dimensions. Firstly, combining indexes representing the dimensional data variability to construct a dimensional variability as follows:
in the method, in the process of the invention,is->Dimension and->A degree of dimensional variance of the individual dimensions; />Is->Dimension and->The number of spacer root nodes in each dimension; />Is->Dimension and->A hierarchy of uniformity of individual dimensions; />Is->Dimension and->Dimensional variability in individual dimensions.
The method simply combines the similarity between the dimensions into an index, and the smaller the value is, the smaller the difference between the dimensions is represented.
Further, since the embodiment designs specific numerical adjustment when using the dimension association degree to improve the PCA algorithm, the dimension association degree is requiredFurther numerical adjustments were made as follows:
in the method, in the process of the invention,is the adjustment coefficient of PCA covariance matrix; />Is the maximum degree of dimensional difference; />Is->Dimension and->Degree of dimensional difference between the individual dimensions; />Is a constant term for performing numerical adjustment on the normalized coefficient; />Is the adjustment intensity of covariance matrix adjustment coefficient, and the present embodiment sets the empirical value adjustment intensity +.>The implementation can be set by the user according to the actual situation.
After the dimension difference degree is reversely normalized, the larger the value of the formula represents the larger the similarity between the dimensions, so that the formula accords with the characteristic that the larger the covariance absolute value in the covariance matrix is, the larger the similarity between the dimensions is; subtracting 0.5 from each dimension difference of inverse normalization to enable the sum of all covariance matrix adjustment coefficients to be 0, enabling the covariance matrix adjustment coefficients to have minimum influence on the overall variance represented by the covariance matrix when carrying out numerical adjustment on the covariance matrix, and only changing the data association degree between the dimension data and the covariance mean.As the adjustment intensity, it is apparent that the larger the value thereof, the larger the adjustment amplitude of the covariance matrix adjustment coefficient to the covariance matrix.
In the PCA algorithm, one of the steps is to calculate covariance for all sample data, the first of whichLine->Column element represents->Dimension and->Correlation in the individual dimensions, the covariance matrix adjustment coefficients are therefore used to improve the PCA algorithm, specifically:
after the covariance matrix is calculated by the PCA algorithm, the first covariance matrix is calculatedLine->The column elements are kept unchanged in positive and negative to expand the absolute value to the original +.>Double, the improvement of the PCA algorithm is completed.
In this way, similar characteristics of the structured dimensions in the peculiar smell data of the electronic nose are integrated into the process of judging the relativity of the data of different dimensions by the PCA algorithm, the final dimension reduction result divides the dimensions with large similarity into one class, the dimensions with small similarity show numerical differences, a better dimension reduction effect is obtained, and the final electronic nose can evaluate the air safety more accurately.
After the dimension of the data is reduced in the steps, a part of data features representing the peculiar smell information are eliminated in the dimension reduction process, so that the data features of the dimension reduced data are further enhanced to obtain the data features acquired through the electronic nose, and a better air safety evaluation effect is obtained.
The data enhancement scale was constructed as follows:
for the data before and after the dimension reduction, the information quantity change of the sample data represents the change of the contained characteristic information, so that the larger the information quantity change of the sample data is, the more serious the information loss is caused to the sample data during the dimension reduction, and the more the sample data should be subjected to numerical enhancement after the dimension reduction so as to compensate the information loss caused by the dimension reduction. The data enhancement scale can thus be constructed by varying the amount of information in the sample data.
Sample data before dimension reductionIs +.>Vector of dimensions, therefore, can calculate the nearest +_for its Euclidean distance>The Euclidean distance of the sample data is further calculated to obtain +.>Mean value of individual Euclidean distances as sample data +.>Is a local density of (2); further normalize the local densities of all sample data to obtain +.>First->The local probability before dimension reduction of each concentration sample is marked as +.>. It should be noted that local probability->Can represent->In the data space of the dimension, the frequency of the data of the sample data position is larger as the local probability is larger, and the local probability normalization can characterize +.>Probability of occurrence of certain position data in the data space of the dimension. Wherein, the present embodiment takes the number of neighboring samples of the checked value +.>The implementation can be set by the user according to the actual situation.
Similarly, the sample data after dimension reduction is recorded asIs +.>Vector of dimension, which is operated in the same way as sample data before dimension reduction to obtain +.>The +.sup.th of harmful volatile chemical species>The local probability after the dimension reduction of each concentration sample is recorded as. In the embodiment, dimension data after checking values and dimension reduction are taken +.>The implementation can be set by the user according to the actual situation.
Further according to the firstThe +.sup.th of harmful volatile chemical species>Partial probability before dimension reduction of each concentration sample>Calculating the information quantity before dimension reduction, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the At the same time according to->First->Partial probability after dimension reduction of each concentration sampleCalculating the information quantity after dimension reduction, and recording as +.>The method comprises the steps of carrying out a first treatment on the surface of the The calculation process of the information amount is well known in the art, and the description of this embodiment is omitted. Information quantity characterizes the->The +.sup.th of harmful volatile chemical species>The larger the value, the more information the individual concentration samples contain at the location in the data space.
Further, the data enhancement scale is calculated as follows:
in the method, in the process of the invention,is->Harmful volatile chemical substances->A data enhancement scale for the locations of the individual concentration sample data in the data space; />Is->Harmful volatile chemical substances->The data enhancement scale before the numerical value adjustment of the concentration samples; />Is->Harmful volatile chemical substances->Information quantity before dimension reduction of each concentration sample; />Is->Harmful volatile chemical substances->Information quantity after dimension reduction of each concentration sample; />Is->Harmful volatile chemical substances->Local probability after dimension reduction of each concentration sample; />Is the enhancement weight of the data enhancement scale. Wherein, the embodiment takes the experimented value enhancement weight +.>The implementation can be set by the user according to the actual situation.
AndIn which the +.about.before and after dimension reduction is calculated>Species chemical substance->The larger the information quantity change of the position of the concentration sample data in the data space, the more the information quantity change is, which means that the more the information loss is caused by the data dimension reduction, the data of the position should be subjected to larger-scale data enhancement in the corresponding dimension-reduced data space; similarly, after dimension reductionIn the data space of the position with high distribution probability, the data occurrence frequency is high, the data is dense, and the data of the position is subjected to larger-scale data enhancement. Finally->The greater the value, the>Species chemical substance->The more data the individual concentration sample data is located in the data space should be enhanced.
AndIn (2), since the numerical enhancement of the data space after the dimension reduction involves numerical adjustment, the data enhancement dimension before the numerical adjustment is +.>Normalized and multiplied by the enhancement weight +.>,/>The larger the value, the stronger the numerical adjustment of the reduced-dimension data space.
Further, the obtained sample data after dimension reduction is processedAnd its corresponding enhancement scale ++>As input, an average interpolation algorithm is adopted, and the output is +.>Enhancement dimension of dimensional space->Distribution function of->
By a distribution functionWhen the sample data after dimension reduction is +.>When it is available that it is +.>Corresponding enhancement dimension in dimensional space +.>
Further, for the sample data after the d-th dimension reductionCalculating center coordinates, noted->. Further by a distribution function->And sample data center coordinates->The specific method for adjusting the sample data after dimension reduction comprises the following steps:
for the firstSample data obtained by the electronic nose, and after dimension reduction by the method, the obtained data is recorded as +.>Calculate->Coordinates with the sample data center>The direction vector of (2) is marked->The method comprises the steps of carrying out a first treatment on the surface of the Further calculate the enhanced backward direction vector +>The method comprises the steps of carrying out a first treatment on the surface of the Further calculate enhanced +.>Individual dimension-reduced sample dataAnd (5) completing data enhancement.
Data enhancement to enhance scaleFor weight, the distance between the data after dimension reduction and the coordinates of the sample data center is increased>The difference between sample data after dimension reduction is enlarged, so that peculiar smell characteristic information contained in the data is more obvious, a better electronic nose data processing effect can be obtained, and air safety evaluation is more accurate.
And step S003, training the enhanced sample data, completing the accurate recognition of the peculiar smell by the electronic nose, and realizing air safety evaluation.
In this embodiment, the multi-layer sensor classifier is used to complete the further processing of the dimension-reduced data, and identify the harmful chemical volatile substances in the air, and the specific process is as follows:
for experiments obtainedSample data ∈>The purified air sample data is subjected to dimension reduction processing by adopting the improved PCA algorithm, and data enhancement is performed, so that the purified air sample data is obtainedAccurate post-dimensionality reduction data.
The first data after dimension reductionSpecies chemical substance->Sample data of individual concentration samples->The enhanced dimension-reducing sample data is obtained by processing>. Will pass->Individual sample dataEnhanced dimension reduction sample data obtained from the purified air sample data is used as training data set +.>Then all are in commonData, the->The data are recorded as->Each data corresponds to +.>The values are used as labels and trained using a multi-layer perceptron classifier as shown in figure 2.
As shown, the data setAs input layer corresponding to input, activating function is RLU function, and the total connection layer is two layers, each layer has +.>A neuron; thereafter, a Softmax layer is inserted behind the fully connected layer, which layer has +.>A neuron; and then all connected with an output layer, the output layer outputs +.>The output content is +.>To->Classification result of (2), wherein->To->The classification result characterizes the corresponding volatile chemical species, th->The individual classification results characterize clean air. The optimizer adopts an Adma optimizer, and the loss function adopts a cross entropy function. The training process of the multi-layer sensor classifier is known in the art, and the description of this embodiment is omitted.
When the obtained multi-layer perceptron classifier model is used, sample data in a current scene is obtained through the electronic nose, and after the sample data are processed by the method, the sample data are output as input of the multi-layer perceptron model, and are output as whether the current scene contains harmful volatile chemical substances and the types of main volatile chemical substances, so that the air safety evaluation is completed.
The present invention has been completed.
Based on the same inventive concept as the above method, the embodiment of the invention further provides an air safety evaluation system, which comprises a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to realize the steps of any one of the above air safety evaluation methods.
In summary, according to the embodiment of the invention, the dimensional data with similar local variation in the response curve or similar response curve is classified by the structural information of the electronic nose data, the electronic nose dimensional data correlation measurement index is calculated, and the covariance matrix adjustment coefficient is obtained by combining a hierarchical clustering algorithm; the method has the advantages that the dimension reduction process of the PCA algorithm is improved by utilizing the structural information of the electronic nose data with different dimensions, so that the dimension reduction result has a better dimension reduction effect, and more effective characteristic information can be reserved;
further, constructing a data enhancement scale for information quantity change conditions of different positions in a data space and local densities of different positions in the data space after the dimension reduction in sample data before and after the dimension reduction, and representing information quantity reduction conditions of positions of each sample data after the dimension reduction, so as to adjust the sample data after the dimension reduction, enable the data after the dimension reduction to contain more information quantity, and obtain a better dimension reduction effect;
finally, compared with the traditional PCA algorithm, the embodiment of the invention reserves more peculiar smell characteristic information when processing the electronic nose data by improving the PCA algorithm, so that the final evaluation on the air safety is more accurate.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.

Claims (9)

1. An air safety evaluation method is characterized by comprising the following steps:
acquiring sample data of each harmful volatile chemical substance monitored by each sensor on the electronic nose under each concentration sample;
for each dimension data in the sample data, obtaining the local variation of the dimension data according to the data difference between the dimension data and the surrounding dimension data; obtaining the dimensional difference between any two dimensions according to the local variation difference between different concentrations of different harmful volatile chemical substances in any two dimensions; acquiring the number of interval root nodes between any two dimensions and similar layers based on the distribution condition of dimensional differences between the dimensions;
constructing dimension difference degree between any two dimensions according to the number of the interval root nodes, the similar layers and the dimension difference; the PCA algorithm is improved according to the dimension difference degree between any two dimensions, and each sample data after dimension reduction is obtained; constructing a data enhancement scale of the sample data according to the data change difference between the sample data before and after dimension reduction; obtaining enhanced sample data according to the dimension-reduced sample data and the enhancement scale;
taking the enhanced sample data as a training data set, training by adopting a multi-layer perceptron classifier, and outputting the types of the corresponding volatile chemical substances;
the dimension data in the sample data specifically comprises: each sensor collects each data point in a response curve sampling vector formed by detection data of sample volatilization.
2. The air safety evaluation method according to claim 1, wherein the obtaining the local variation of the dimension data according to the data difference between the dimension data and the surrounding dimension data comprises:
setting a window containing data with the same quantity adjacent left and right, respectively calculating the absolute value of the data difference value of each data point in the window and the dimension data, and taking the average value of the absolute value of the data difference value of all the data points in the window as the local variation of the dimension data.
3. The air safety evaluation method according to claim 2, wherein the step of obtaining the dimensional difference between any two dimensions from the difference in local variation between different concentrations of different harmful volatile chemical substances in any two dimensions comprises:
for any two dimensions, calculating the absolute value of the difference value of the local variation of each concentration sample of each harmful volatile chemical substance and the distance of the data point on the time sequence; acquiring Euclidean distance between response curve sampling vectors corresponding to any two dimensions;
and calculating the product of the absolute value of the difference and the Euclidean distance, calculating the ratio of the product to the distance, and taking the sum of the ratios under all concentration samples of all harmful volatile chemical substances as the dimension difference between any two dimensions.
4. The air security evaluation method as set forth in claim 3, wherein the obtaining the number of interval root nodes between any two dimensions and the like based on the distribution of the dimensional differences between the dimensions includes:
clustering the dimensional differences among all the dimensions by adopting a hierarchical clustering algorithm, and outputting a tree structure;
the tree structure is input into a Dijkstra algorithm, the shortest path between any two leaf nodes is output, the number of root nodes passing through in the shortest path between any two nodes is used as the number of interval root nodes between any two nodes, and the level of the node at the highest level on the shortest path is used as the similar level between any two nodes.
5. The air security evaluation method as set forth in claim 4, wherein the constructing a dimension difference between any two dimensions according to the number of the interval root nodes, the similar hierarchy and the dimension difference comprises:
and calculating the sum value of the number of the interval root nodes and the similar hierarchy, and taking the product of the sum value and the dimensional difference as the dimensional difference degree between any two dimensions.
6. The air safety evaluation method according to claim 5, wherein the step of obtaining the reduced sample data by improving the PCA algorithm according to the degree of dimensional difference between any two dimensions comprises:
obtaining the maximum dimension difference degree, the dimension difference degree and the value in the dimension difference degree between any two dimensions;
for the dimension difference degree between any two dimensions, calculating the difference value between the maximum dimension difference degree and the dimension difference degree, calculating the ratio of the difference value to the sum of the dimension difference degrees, and taking the product of the difference value of the ratio and 0.5 and the preset adjustment strength as the adjustment coefficient between any two dimensions;
and (3) improving the multiple of each corresponding dimension data of the covariance matrix in the PCA algorithm process based on the adjustment coefficient to obtain each sample data after dimension reduction.
7. The air safety evaluation method according to claim 6, wherein the constructing the data enhancement scale of the sample data based on the data variation difference between the sample data before and after the dimension reduction comprises:
for each sample data before and after dimension reduction, calculating normalized values of Euclidean distance average values of k sample data with the nearest Euclidean distance to the sample data as local probability of the sample data; acquiring the information quantity of the local probability of each sample data before and after dimension reduction;
calculating the absolute value of a difference between corresponding information amounts before and after dimension reduction of sample data, and calculating the product of the absolute value of the difference and the local probability after dimension reduction;
and calculating the sum value of the products of all the sample data, and multiplying the ratio result of the products to the sum value by preset enhancement weights to obtain the data enhancement scale of the sample data.
8. The air security evaluation method according to claim 7, wherein the sample data obtained by enhancing the dimension-reduced sample data and the enhanced dimension data, comprises:
taking the dimension-reduced sample data and the enhancement scale as the input of a mean value interpolation algorithm, and outputting a distribution function; acquiring the center coordinates of the sample data after each dimension reduction;
for each sample data, calculating a direction vector from the sample data to the corresponding center coordinates, calculating a product of the direction vector and the corresponding distribution function value, and taking the sum of the product and the sample data as the enhanced sample data.
9. An air safety assessment system comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-8 when executing the computer program.
CN202410089005.XA 2024-01-23 2024-01-23 Air safety evaluation method and system Active CN117612644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410089005.XA CN117612644B (en) 2024-01-23 2024-01-23 Air safety evaluation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410089005.XA CN117612644B (en) 2024-01-23 2024-01-23 Air safety evaluation method and system

Publications (2)

Publication Number Publication Date
CN117612644A true CN117612644A (en) 2024-02-27
CN117612644B CN117612644B (en) 2024-04-09

Family

ID=89944657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410089005.XA Active CN117612644B (en) 2024-01-23 2024-01-23 Air safety evaluation method and system

Country Status (1)

Country Link
CN (1) CN117612644B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126575A (en) * 2020-01-09 2020-05-08 同济大学 Gas sensor array mixed gas detection method and device based on machine learning
US20220214322A1 (en) * 2021-01-07 2022-07-07 Tsinghua University Air pollutants concentration forecasting method and apparatus and storage medium
CN116705337A (en) * 2023-08-07 2023-09-05 山东第一医科大学第一附属医院(山东省千佛山医院) Health data acquisition and intelligent analysis method
CN116735807A (en) * 2023-08-09 2023-09-12 山东优控智能技术有限公司 Air quality detection and evaluation method based on multi-sensor data
CN116933084A (en) * 2023-07-24 2023-10-24 浙江中烟工业有限责任公司 Pollutant emission prediction method and device
CN117407827A (en) * 2023-12-15 2024-01-16 湖南辉达净化工程有限公司 Abnormal operation data detection method for purification engineering waste gas purification equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126575A (en) * 2020-01-09 2020-05-08 同济大学 Gas sensor array mixed gas detection method and device based on machine learning
US20220214322A1 (en) * 2021-01-07 2022-07-07 Tsinghua University Air pollutants concentration forecasting method and apparatus and storage medium
CN116933084A (en) * 2023-07-24 2023-10-24 浙江中烟工业有限责任公司 Pollutant emission prediction method and device
CN116705337A (en) * 2023-08-07 2023-09-05 山东第一医科大学第一附属医院(山东省千佛山医院) Health data acquisition and intelligent analysis method
CN116735807A (en) * 2023-08-09 2023-09-12 山东优控智能技术有限公司 Air quality detection and evaluation method based on multi-sensor data
CN117407827A (en) * 2023-12-15 2024-01-16 湖南辉达净化工程有限公司 Abnormal operation data detection method for purification engineering waste gas purification equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡雪琦: "面向气体识别与预测的机器嗅觉算法研究与实现", 中国优秀硕士学位论文全文数据库工程科技Ⅰ辑, no. 3, 15 March 2023 (2023-03-15), pages 14 - 23 *

Also Published As

Publication number Publication date
CN117612644B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN109543763B (en) Raman spectrum analysis method based on convolutional neural network
Zhang et al. Classification modeling method for near‐infrared spectroscopy of tobacco based on multimodal convolution neural networks
CN112766379A (en) Data equalization method based on deep learning multi-weight loss function
CN109919241B (en) Hyperspectral unknown class target detection method based on probability model and deep learning
CN107609588B (en) Parkinson patient UPDRS score prediction method based on voice signals
CN111340132B (en) Machine olfaction mode identification method based on DA-SVM
CN109993236A (en) Few sample language of the Manchus matching process based on one-shot Siamese convolutional neural networks
CN111009321A (en) Application method of machine learning classification model in juvenile autism auxiliary diagnosis
CN106991430A (en) A kind of cluster number based on point of proximity method automatically determines Spectral Clustering
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
CN114781484A (en) Cancer serum SERS spectrum classification method based on convolutional neural network
CN106528679A (en) Time series analysis method based on multilinear autoregression model
CN117612644B (en) Air safety evaluation method and system
CN113095354A (en) Unknown radar target identification method based on radiation source characteristic subspace knowledge
CN110542659B (en) Pearl luster detection method based on visible light spectrum
CN110675382A (en) Aluminum electrolysis superheat degree identification method based on CNN-LapseLM
CN116595465A (en) High-dimensional sparse data outlier detection method and system based on self-encoder and data enhancement
CN116738330A (en) Semi-supervision domain self-adaptive electroencephalogram signal classification method
CN115470863A (en) Domain generalized electroencephalogram signal classification method based on double supervision
CN113780405B (en) Air conditioner parameter regression optimization method based on deep neural network
CN112465054B (en) FCN-based multivariate time series data classification method
CN114495220A (en) Target identity recognition method, device and storage medium
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant