CN116662853A - Method and system for automatically identifying analysis result of pollution source - Google Patents

Method and system for automatically identifying analysis result of pollution source Download PDF

Info

Publication number
CN116662853A
CN116662853A CN202310616810.9A CN202310616810A CN116662853A CN 116662853 A CN116662853 A CN 116662853A CN 202310616810 A CN202310616810 A CN 202310616810A CN 116662853 A CN116662853 A CN 116662853A
Authority
CN
China
Prior art keywords
sample
data set
category
samples
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310616810.9A
Other languages
Chinese (zh)
Other versions
CN116662853B (en
Inventor
赵雪婷
陈晖�
秦伟
张永前
万晶晶
盛世杰
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinhe Digital Technology Wuxi Co ltd
Original Assignee
Xinhe Digital Technology Wuxi Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinhe Digital Technology Wuxi Co ltd filed Critical Xinhe Digital Technology Wuxi Co ltd
Priority to CN202310616810.9A priority Critical patent/CN116662853B/en
Publication of CN116662853A publication Critical patent/CN116662853A/en
Application granted granted Critical
Publication of CN116662853B publication Critical patent/CN116662853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Automatic Analysis And Handling Materials Therefor (AREA)

Abstract

The application provides a method and a system for automatically identifying a pollution source analysis result, comprising the following steps: dividing a sample set formed by analysis results of various pollutant sources in the literature, and generating a test data set and a training data set according to the division results; processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result; acquiring an instance data set of a category to be determined; determining distances between the instance dataset and individual samples in a sample set; obtaining samples of the distance array in the first k training data sets according to the optimal parameter k value; determining the belonging category of the instance data set according to the belonging categories of the samples in the first k training data sets; the application has the following beneficial effects: the method can efficiently and accurately obtain the analysis result of the pollutant source, reduce the technical barrier of source analysis, and solve the problems of complex prior art and dependence on hardware resource allocation.

Description

Method and system for automatically identifying analysis result of pollution source
Technical Field
The application relates to the technical field of pollutant source analysis, in particular to a method and a system for automatically identifying a pollutant source analysis result.
Background
At present, in the atmospheric environment field, as a receptor model is simple and easy to implement and relatively reliable, most of the receptor models are used for carrying out pollution source analysis work, wherein a chemical mass balance method needs input of source spectrum data, and the applicability of areas is low, so that the use ratio of a positive matrix factor decomposition (PMF) is higher, the positive matrix factor decomposition (PMF) mainly carries out pollutant source analysis according to a receptor chemical component data set of a long-time sequence, source sample collection is not needed, extracted factors are indexes of mathematical significance, actual source types are further identified through chemical component information of source characteristics, but the source identification of analysis results of the positive matrix factor decomposition (PMF) often needs to accumulate a large amount of pollution source component spectrum knowledge to combine with experience of a local source to carry out artificial judgment, the analysis results also can be different from person to person, and the analysis results obtained by sampling in each area are often greatly different due to influences of different local industrial structures, land types, urban construction plans and the like; some students start from a machine learning method in the last two years, and training artificial experience as tag data in a supervised learning mode to obtain a prediction result with higher accuracy, wherein the machine learning method used by the current students has higher accuracy but still has some problems: for example, the existing recognition model is complex (deep neural network), needs collection and processing of a large amount of training set data, consumes time and resources, stays in a scientific research stage, and cannot respond to the actual requirements of atmospheric pollution prevention, control and treatment in real time and rapidly.
Disclosure of Invention
In view of the above drawbacks of the prior art, an object of the present application is to provide a method and a system for automatically identifying a pollution source analysis result, which are used for solving the problems of complex analysis process and low accuracy of analysis result in the prior art.
The embodiment of the application provides a method for automatically identifying a pollution source analysis result, which comprises the following steps of: obtaining analysis results of various pollutant sources in a literature, and forming a sample set from the analysis results of the various pollutant sources in the literature, wherein the analysis results comprise the category of the pollutant; dividing the sample set, and generating a test data set and a training data set according to a division result; processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result; acquiring an instance data set of a category to be determined; determining distances between the instance dataset and individual samples in a sample set; obtaining samples of the distance array in the first k training data sets according to the optimal parameter k value; the belonging category of the instance dataset is determined from the belonging categories of the samples in the first k training datasets.
The embodiment of the application also provides a system for automatically identifying the analysis result of the pollution source, which comprises the following steps: the acquisition module is used for acquiring analysis results of various pollutant sources in a literature and forming a sample set from the analysis results of the various pollutant sources in the literature, wherein the analysis results comprise the category of the pollutant, and acquiring an instance data set of the category to be determined; the processing module is used for dividing the sample set and generating a test data set and a training data set according to the division result; processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result; a determining module for determining a distance between the instance dataset and each sample in a sample set; obtaining samples of the distance array in the first k training data sets according to the optimal parameter k value; the belonging category of the instance dataset is determined from the belonging categories of the samples in the first k training datasets.
The embodiment of the application also provides a server, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of automatically identifying a contamination source resolution as described above.
Embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, implements a method for automatically identifying a contamination source parsing result as described above.
Compared with the prior art, the embodiment of the application has the main differences and effects that: the method comprises the steps of processing a training data set in a sample set formed by analysis results of various pollutant sources in a document by using superparameter test, obtaining an optimal parameter k value according to the processing result, determining distances between the sample data set and each sample in the sample set, obtaining samples of which the distances are arranged in the first k training data sets according to the optimal parameter k value, and finally obtaining the belonging category of the sample data set according to the belonging category of the samples in the first k training data sets, thereby obtaining the analysis result of the pollutant sources efficiently and accurately, reducing the technical barrier of source analysis, and solving the problems of complex prior art and dependence on hardware resource allocation.
As a further improvement, after the training data set is processed by using the hyper-parameter test and the optimal parameter K value is obtained according to the processing result, before the obtaining the instance data set to be determined to belong to the category, the method includes: and testing the optimal parameter k value by using the test data set, and determining whether the optimal parameter k value is the optimal parameter k value according to a test result.
As a further refinement, the determining the distance between the example dataset and each sample in the sample set comprises: distances between the instance dataset and respective samples in the sample set are determined according to a metric distance manner.
As a further improvement, the metric distance mode includes an euclidean distance mode, a Min Shi distance mode, and a manhattan distance mode.
As a further refinement, the determining the distance between the instance dataset and the sample in the sample set according to a metric distance manner comprises: determining the distance between the example data set and the sample in the sample set according to a formula corresponding to the Euclidean distance mode:wherein x is a characteristic factor in an example data set of a category to be determined, and y is a characteristic factor of each sample in a sample set; or determining the distance between the sample in the sample set and the example data set according to a formula corresponding to the Min Shi distance mode: />Wherein x is a characteristic factor in an example data set of a category to be determined, y is a characteristic factor of each sample in the sample set, p is a variable, and when p=2, the formula is a formula corresponding to the Euclidean distance mode; or determining the distance between the sample in the sample set and the example data set according to a formula corresponding to the Manhattan distance mode: />Wherein x is a characteristic factor in the example data set of the category to be determined, y is a characteristic factor of each sample in the sample set, p is a variable, and when p=1, the formula is a formula corresponding to a Min Shi distance mode.
According to the scheme, the distance between the instance data set and each sample in the sample set can be calculated through the Euclidean distance mode, the Min Shi distance mode or the Manhattan distance mode, so that the category of the instance data set can be conveniently obtained according to the category of the sample in the sample set close to the instance data set.
As a further improvement, after said determining the distance between the instance dataset and each sample in the sample set, before said deriving samples of the training dataset for which said distance is arranged in the first k according to the optimal parameter k value, comprising: the distances between the sample in the sample set and the example dataset are arranged in order of decreasing size.
As a further refinement, the determining the belonging category of the example data set from the belonging categories of the samples in the first k training data sets includes: determining the most belonged category in the samples in the first k training data sets according to the majority voting principle; the category to which the instance dataset belongs is determined from the most belonging categories in the samples in the first k training datasets.
According to the scheme, the most affiliated categories in the samples in the first k training data sets can be obtained according to the majority voting principle, and then the affiliated categories of the instance data sets are obtained according to the most affiliated categories in the samples in the first k training data sets, so that the aim of obtaining the affiliated categories of the instance data sets is fulfilled.
Drawings
FIG. 1 is a flowchart of a method for automatically identifying a pollution source resolution result according to a first embodiment of the present application;
FIG. 2 is a flow chart of a method for automatically identifying a pollution source resolution result according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of a system for automatically identifying a pollution source analysis result according to a third embodiment of the present application;
fig. 4 is a schematic view of an electronic device in a fourth embodiment of the application;
FIG. 5 is a reference diagram of a portion of a sample set in accordance with the present application;
FIG. 6 is a schematic diagram of the results of parsing an example dataset in accordance with the present application.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
The first embodiment of the application relates to a method for automatically identifying a pollution source analysis result. The flow is shown in fig. 1, and is specifically as follows:
step 101, obtaining analysis results of various pollutant sources in a document, and forming a sample set from the analysis results of various pollutant sources in the document;
specifically, the analysis result includes the category of the contaminant, a part of the sample set is shown in fig. 5, and the contaminant is mainly the atmospheric contaminant in the present application.
And 102, dividing the sample set, and generating a test data set and a training data set according to the division result.
Specifically, 20% of the sample set is used as a test data set, 80% of the sample set is used as a training data set, and in practical application, the sample set can be divided into test data sets and training data sets in other proportions according to specific conditions.
And 103, processing the training data set by using the hyper-parameter test, and obtaining the optimal parameter k value according to the processing result.
Specifically, k is equal to or greater than 1 and equal to or less than 20, and k is a positive integer, the super-parameter test is to create a possible value list for the parameters, randomly select a value from the parameters, process the training data set by using the random combination of the super-parameters, and then compare the different parameter tests, select the parameter with the highest accuracy, namely the optimal parameter k value, and shorten or increase the threshold range of the k value initial super-parameter according to experience.
Step 104, obtaining an instance data set of the category to be determined.
Specifically, the instance data of the category to be determined is acquired, and the instance data is formed into an instance data set.
Step 105, determining a distance between the instance dataset and each sample in the sample set.
Specifically, the distances between the example dataset and each sample in the sample set are determined according to a metric distance manner, wherein if a sample is in the feature space and most of the k most similar (i.e., nearest neighbor) samples in the feature space belong to a certain class, the sample also belongs to the class, and the method of computing the nearest neighbor of the sample is the metric distance manner, which includes the Euclidean distance manner, the Min Shi distance manner, and the Manhattan distance manner.
And 106, obtaining samples which are arranged in the first k training data sets according to the optimal parameter k values.
Specifically, the distance herein refers to the distance between the example dataset and each sample in the sample set.
Step 107, determining the belonging category of the instance data set according to the belonging categories of the samples in the first k training data sets.
Specifically, the most belonging category in the samples in the first k training data sets is determined according to the majority voting principle, and then the belonging category of the instance data set is determined according to the most belonging category in the samples in the first k training data sets.
According to the method, the training data set in the sample set formed by analysis results of various pollutant sources in the literature can be processed through super-parameter test, then the optimal parameter k value is obtained according to the processing result, the distances between the example data set and each sample in the sample set are determined, then the samples in the first k training data sets are obtained according to the optimal parameter k value, finally the category of the example data set is obtained according to the category of the sample in the first k training data sets, so that the analysis result of the pollutant sources can be obtained efficiently and accurately, the technical barrier of source analysis is reduced, and the problems of complex prior art and dependence on hardware resource allocation are solved.
A second embodiment of the present application relates to a method for automatically identifying a pollution source analysis result, and the second embodiment is a detailed discussion of the whole first embodiment, and mainly includes: in a second embodiment of the application, an embodiment is specified which discusses a specific procedure of determining the belonging class of the instance dataset from the belonging class of the samples in the first k training datasets.
Referring to fig. 2, the present embodiment includes the following steps, which are described as follows:
steps 201 to 203 are similar to steps 101 to 103 in the first embodiment, and are not described here again.
And 204, testing the optimal parameter k value by using the test data set, and determining whether the optimal parameter k value is the optimal parameter k value according to the test result.
Step 205 is similar to step 104 of the first embodiment, and will not be described again.
Step 206, determining the distance between the example data set and each sample in the sample set according to the metric distance approach.
Specifically, the distance between the sample in the sample set and the example data set is determined according to a formula corresponding to the Euclidean distance mode:wherein x is a characteristic factor in an example data set of a category to be determined, and y is a characteristic factor of each sample in a sample set; or determining the distance between the sample in the sample set and the example data set according to the formula corresponding to the Min Shi distance mode: />Wherein x is a characteristic factor in the example data set of the category to be determined, y is a characteristic factor of each sample in the sample set,p is a variable, and when p=2, the formula is a formula corresponding to the Euclidean distance mode; or determining the distance between the sample in the sample set and the example data set according to a formula corresponding to the Manhattan distance mode: />Wherein x is a characteristic factor in the example data set of the category to be determined, y is a characteristic factor of each sample in the sample set, p is a variable, and when p=1, the formula is a formula corresponding to a Min Shi distance mode.
Step 207, arranging the distances between the sample in the sample set and the example dataset in order of decreasing size.
Specifically, the distances between the example data set and each sample in the sample set are calculated according to the euclidean distance method, the Min Shi distance method or the manhattan distance method, then the distances between the example data set and the samples in the sample set are arranged in order from small to large, the samples with the distances arranged in the first k training data sets are obtained, and the set represented by the k sample points is denoted as n_k (b).
Step 208 is similar to step 106 of the first embodiment and will not be described again.
Step 209, determining the most belonging category in the samples in the first k training data sets according to the majority voting principle.
Step 210, determining the category of the instance data set according to the most belonging category in the samples in the first k training data sets.
Specifically, the most affiliated category in the samples in the first k training data sets is selected according to the majority voting principle, then the affiliated category of the example data set is w, and then the affiliated category w of the example data set is updated into the sample set, so that the simulation precision of the application can be improved and optimized continuously.
In practical application, firstly, a sample set is constructed: collecting analysis results of various pollutant sources in a finishing document, wherein the analysis results comprise the category of the pollutant, namely labels, the analysis results also comprise contribution characteristic values of various components of the various pollutant sources, and then respectively manufacturing the analysis results of the various pollutant sources in the document into two initial sample sets (one sample set is 10 categories of a major category and one sample set is 15 categories of a subdivision minor category), wherein the two sample sets correspond to the first category and are numbered by the labels, and the 2 nd-n columns are contribution characteristic values of the components;
then build an instance dataset: inputting the VOCs component data of Nanjing city to be analyzed into PMF software, and simulating to obtain contribution characteristics of each component of unidentified category; dividing the two sample sets (taking 20% as a test data set and 80% as a training data set), and selecting an optimal parameter k value by using a super-parameter test, wherein the samples of the training set are of a large class: metric distance = euclidean distance, k = 3; subclass training set samples: metric distance = euclidean distance, k = 5; then, simulating the example data set by using the optimal parameter k value, wherein the artificial experience recognition result is as follows: paint solvents, natural sources, LPG emissions, gasoline vehicle emissions, petrochemical; the major classes are identified as paint solvents, natural sources, LPG emissions, motor vehicle exhaust emissions (motor vehicle emissions characteristics in the major classes include gasoline vehicle emissions), petrochemical; the subdivision subclasses are identified as paint solvents, natural sources, LPG emissions, motor vehicle exhaust emissions (the subdivision subclasses refine motor vehicle exhaust emissions and gasoline vehicle exhaust emissions characteristics, fall into two tag characteristics), petrochemical; therefore, the accuracy of the large class identification is 100 percent, and the accuracy of the subdivision subclass is 80 percent; finally, the large-class recognition result (label+contribution characteristic value of each component) of the instance dataset is newly added into the large-class training set, the sub-division small-class recognition result is also newly added into the sub-division small-class training set after being manually corrected, and the analysis result of the instance dataset is shown in fig. 6.
According to the embodiment, the distance between the instance data set and each sample in the sample set can be calculated through the Euclidean distance mode, the Min Shi distance mode or the Manhattan distance mode, so that the category of the instance data set can be conveniently obtained according to the category of the sample in the sample set close to the instance data set; the most affiliated categories in the samples in the first k training data sets can be obtained according to the majority voting principle, and then the affiliated categories of the instance data sets are obtained according to the most affiliated categories in the samples in the first k training data sets, so that the aim of obtaining the affiliated categories of the instance data sets is fulfilled.
A third embodiment of the present application relates to a system for automatically identifying a pollution source analysis result, referring to fig. 3, including:
the acquisition module is used for acquiring analysis results of various pollutant sources in the literature and forming a sample set from the analysis results of the various pollutant sources in the literature, wherein the analysis results comprise the category of the pollutant, and acquiring an instance data set of the category to be determined;
the processing module is used for dividing the sample set and generating a test data set and a training data set according to the dividing result; processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result;
a determining module for determining a distance between the instance data set and each sample in the sample set; samples in the first k training data sets are arranged in distance according to the optimal parameter k values; the belonging category of the instance dataset is determined from the belonging categories of the samples in the first k training datasets.
It is to be noted that this embodiment is a system example corresponding to the first embodiment, and can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and in order to reduce repetition, a detailed description is omitted here. Accordingly, the related art details mentioned in the present embodiment can also be applied to the first embodiment.
It should be noted that each module in this embodiment is a logic module, and in practical application, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present application, units that are not so close to solving the technical problem presented by the present application are not introduced in the present embodiment, but this does not indicate that other units are not present in the present embodiment.
A fourth embodiment of the present application relates to a server, referring to fig. 4, including:
at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of automatically identifying a contamination source resolution as described above.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
A fifth embodiment of the present application relates to a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described method embodiments.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In summary, the application processes the training data set in the sample set formed by the analysis results of various pollutant sources in the literature by using the hyper-parametric test, then obtains the optimal parameter k value according to the processing result, determines the distances between the example data set and each sample in the sample set, then obtains the samples in the training data sets with the distances arranged in the first k according to the optimal parameter k value, and finally obtains the belonging category of the example data set according to the belonging category of the samples in the training data sets with the first k, thereby efficiently and accurately obtaining the analysis result of the pollutant sources, reducing the technical barrier of source analysis, and solving the problems of complex prior art and dependence on hardware resource allocation. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (10)

1. A method for automatically identifying a pollution source analysis result, comprising the steps of:
obtaining analysis results of various pollutant sources in a literature, and forming a sample set from the analysis results of the various pollutant sources in the literature, wherein the analysis results comprise the category of the pollutant;
dividing the sample set, and generating a test data set and a training data set according to a division result;
processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result;
acquiring an instance data set of a category to be determined;
determining distances between the instance dataset and individual samples in a sample set;
obtaining samples of the distance array in the first k training data sets according to the optimal parameter k value;
the belonging category of the instance dataset is determined from the belonging categories of the samples in the first k training datasets.
2. A method for automatically identifying a pollution source resolution as in claim 1, wherein: after the training data set is processed by using the super-parameter test and the optimal parameter K value is obtained according to the processing result, the method comprises the following steps:
and testing the optimal parameter k value by using the test data set, and determining whether the optimal parameter k value is the optimal parameter k value according to a test result.
3. A method for automatically identifying a pollution source resolution as in claim 1, wherein: the determining a distance between the instance dataset and each sample in the sample set comprises:
distances between the instance dataset and respective samples in the sample set are determined according to a metric distance manner.
4. A method of automatically identifying a source of contamination resolution as recited in claim 3, wherein: the metric distance mode includes an euclidean distance mode, a Min Shi distance mode, and a manhattan distance mode.
5. The method for automatically identifying a pollution source resolution as recited in claim 4, wherein: the determining the distance between the example dataset and the sample in the sample set according to the metric distance manner comprises:
determining the distance between the example data set and the sample in the sample set according to a formula corresponding to the Euclidean distance mode:
wherein x is a characteristic factor in an example data set of a category to be determined, and y is a characteristic factor of each sample in a sample set;
or determining the distance between the sample in the sample set and the example data set according to a formula corresponding to the Min Shi distance mode:
wherein x is a characteristic factor in an example data set of a category to be determined, y is a characteristic factor of each sample in the sample set, p is a variable, and when p=2, the formula is a formula corresponding to the Euclidean distance mode;
or determining the distance between the sample in the sample set and the example data set according to a formula corresponding to the Manhattan distance mode:
wherein x is a characteristic factor in the example data set of the category to be determined, y is a characteristic factor of each sample in the sample set, p is a variable, and when p=1, the formula is a formula corresponding to a Min Shi distance mode.
6. A method for automatically identifying a pollution source resolution as in claim 1, wherein: after said determining the distances between the instance dataset and the respective samples in the sample set, said deriving the distances from the optimal parameter k values before the samples in the first k training datasets comprises:
the distances between the sample in the sample set and the example dataset are arranged in order of decreasing size.
7. A method for automatically identifying a pollution source resolution as in claim 1, wherein: the determining the belonging category of the instance data set according to the belonging category of the samples in the first k training data sets comprises:
determining the most belonged category in the samples in the first k training data sets according to the majority voting principle;
the category to which the instance dataset belongs is determined from the most belonging categories in the samples in the first k training datasets.
8. A system for automatically identifying a source of contamination resolution, comprising: comprising the following steps:
the acquisition module is used for acquiring analysis results of various pollutant sources in a literature and forming a sample set from the analysis results of the various pollutant sources in the literature, wherein the analysis results comprise the category of the pollutant, and acquiring an instance data set of the category to be determined;
the processing module is used for dividing the sample set and generating a test data set and a training data set according to the division result; processing the training data set by using super-parameter test, and obtaining an optimal parameter k value according to a processing result;
a determining module for determining a distance between the instance dataset and each sample in a sample set; obtaining samples of the distance array in the first k training data sets according to the optimal parameter k value; the belonging category of the instance dataset is determined from the belonging categories of the samples in the first k training datasets.
9. A server, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of automatically identifying a pollution source resolution result as claimed in any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a method of automatically identifying a contamination source resolution result according to any one of claims 1 to 7.
CN202310616810.9A 2023-05-29 2023-05-29 Method and system for automatically identifying analysis result of pollution source Active CN116662853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310616810.9A CN116662853B (en) 2023-05-29 2023-05-29 Method and system for automatically identifying analysis result of pollution source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310616810.9A CN116662853B (en) 2023-05-29 2023-05-29 Method and system for automatically identifying analysis result of pollution source

Publications (2)

Publication Number Publication Date
CN116662853A true CN116662853A (en) 2023-08-29
CN116662853B CN116662853B (en) 2024-04-30

Family

ID=87727282

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310616810.9A Active CN116662853B (en) 2023-05-29 2023-05-29 Method and system for automatically identifying analysis result of pollution source

Country Status (1)

Country Link
CN (1) CN116662853B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468926A (en) * 2015-12-29 2016-04-06 北京师范大学 Underground water type drinking water source pollution source analysis method
CN108595414A (en) * 2018-03-22 2018-09-28 浙江大学 Heavy metal-polluted soil enterprise pollution source discrimination based on source remittance space variable reasoning
CN109785912A (en) * 2019-02-13 2019-05-21 中国科学院大气物理研究所 A kind of factor method for quickly identifying and device for target contaminant source resolution
CN113470765A (en) * 2021-06-29 2021-10-01 广州市华南自然资源科学技术研究院 Soil heavy metal source analysis method
CN114660030A (en) * 2022-03-17 2022-06-24 清华大学合肥公共安全研究院 Pollution source analysis method and device and storage medium
CN115494047A (en) * 2022-11-17 2022-12-20 广东博创佳禾科技有限公司 Detection method and system for water environment agricultural pollutants
WO2023035362A1 (en) * 2021-09-07 2023-03-16 上海观安信息技术股份有限公司 Polluted sample data detecting method and apparatus for model training

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468926A (en) * 2015-12-29 2016-04-06 北京师范大学 Underground water type drinking water source pollution source analysis method
CN108595414A (en) * 2018-03-22 2018-09-28 浙江大学 Heavy metal-polluted soil enterprise pollution source discrimination based on source remittance space variable reasoning
CN109785912A (en) * 2019-02-13 2019-05-21 中国科学院大气物理研究所 A kind of factor method for quickly identifying and device for target contaminant source resolution
CN113470765A (en) * 2021-06-29 2021-10-01 广州市华南自然资源科学技术研究院 Soil heavy metal source analysis method
WO2023035362A1 (en) * 2021-09-07 2023-03-16 上海观安信息技术股份有限公司 Polluted sample data detecting method and apparatus for model training
CN114660030A (en) * 2022-03-17 2022-06-24 清华大学合肥公共安全研究院 Pollution source analysis method and device and storage medium
CN115494047A (en) * 2022-11-17 2022-12-20 广东博创佳禾科技有限公司 Detection method and system for water environment agricultural pollutants

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
T.N.WU ET AL: "Source identification of groundwater pollution with the aid of multivariate statistical analysis", 《WATER SCIENCE AND TECHNOLOGY:WATER》, vol. 5, no. 6, pages 281 - 288 *
宋耀宇: "基于深度神经网络的大气污染物分布预测", 《中国优秀硕士学位论文全文数据库(电子期刊)》, vol. 2021, no. 01 *

Also Published As

Publication number Publication date
CN116662853B (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN106611052B (en) The determination method and device of text label
DE102013209868B4 (en) Querying and integrating structured and unstructured data
CN108519971B (en) Cross-language news topic similarity comparison method based on parallel corpus
CN111382248B (en) Question replying method and device, storage medium and terminal equipment
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN116610983B (en) Abnormality analysis method and system for air purification control system
CN106570197A (en) Searching and ordering method and device based on transfer learning
CN112819246A (en) Energy demand prediction method for optimizing neural network based on cuckoo algorithm
CN110457706B (en) Point-of-interest name selection model training method, using method, device and storage medium
CN111813888A (en) Training target model
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN113516205B (en) Employee stability classification method based on artificial intelligence and related equipment
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN111104503A (en) Construction engineering quality acceptance standard question-answering system and construction method thereof
CN103605493A (en) Parallel sorting learning method and system based on graphics processing unit
CN117573985A (en) Information pushing method and system applied to intelligent online education system
CN109344400A (en) A kind of judgment method and device of document storage
Dovbysh et al. Estimation of Informativeness of Recognition Signs at Extreme Information Machine Learning of Knowledge Control System.
CN116662853B (en) Method and system for automatically identifying analysis result of pollution source
CN107301226A (en) The automatic evaluation method of module is retrieved from a kind of question answering system
CN116861358A (en) BP neural network and multi-source data fusion-based computing thinking evaluation method
CN112215006B (en) Organization named entity normalization method and system
CN114238768A (en) Information pushing method and device, computer equipment and storage medium
CN103544500A (en) Multi-user natural scene mark sequencing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhao Xueting

Inventor after: Chen Hui

Inventor after: Qin Wei

Inventor after: Zhang Yongqian

Inventor after: Wan Jingjing

Inventor after: Sheng Shijie

Inventor after: Wang Lei

Inventor before: Zhao Xueting

Inventor before: Chen Hui

Inventor before: Qin Wei

Inventor before: Zhang Yongqian

Inventor before: Wan Jingjing

Inventor before: Sheng Shijie

Inventor before: Wang Lei

GR01 Patent grant
GR01 Patent grant