CN113159448A

CN113159448A - Automatic analysis and discrimination method based on environmental protection big data

Info

Publication number: CN113159448A
Application number: CN202110516775.4A
Authority: CN
Inventors: 孙元晓; 周轶文; 刘军胜; 司梦晨; 王大伟
Original assignee: Yantai Yinghui Intelligent Technology Co ltd
Current assignee: Yantai Yinghui Intelligent Technology Co ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-07-23

Abstract

The invention relates to the technical field of deep learning networks, in particular to an automatic analysis and judgment method based on environmental protection big data. The method comprises the steps of designing an automatic analysis and identification method, sorting industrial output values and pollutant emission data, predicting and judging the industrial output value data, predicting and judging the pollutant emission data and the like. The design of the invention is based on deep learning network and big data processing algorithm to design the basic flow of the automatic analysis and identification method, and respectively carries out true and false discrimination on industrial output value data and pollutant emission data in sequence; meanwhile, relevant information among industrial total output value data of consecutive years is extracted through a neural network, a model suitable for special influence is established, and then a data screening device, a weak classifier and a strong classifier are trained respectively, so that prediction and discrimination of the industrial total output value data can be realized; in addition, a pollutant emission data prediction model based on the convolutional neural network is trained, and the pollutant emission data can be predicted and distinguished.

Description

Automatic analysis and discrimination method based on environmental protection big data

Technical Field

The invention relates to the technical field of deep learning networks, in particular to an automatic analysis and judgment method based on environmental protection big data.

Background

All departments in the industry can generate pollutants in the production process, and along with the continuous development of technology and the attention of society to ecological environment protection, the emission of pollutants is more strictly required, so that the detection of industrial pollution discharge data is particularly important. The traditional method is that the total production amount submitted by industrial departments and the numerical value of related pollutant discharge amount are manually judged, but part of the industrial departments conceal the misreported industrial pollution discharge data in order to reduce the environmental protection cost and improve the benefit, so that the related environmental protection departments are usually required to issue a large amount of manpower and material resources to check the authenticity of the pollution discharge data reported by factories.

With the continuous development of artificial intelligence and big data and other related technologies, intelligent judgment and prediction based on big data are also continuously mentioned. In the massive industrial pollutant emission and output value data, the proportion of the emission amount to the output value of the industrial pollutant is an important judgment index for judging whether the environmental protection of the factory reaches the standard or not, so that the authenticity judgment of the emission amount and the output value data of the industrial pollutant is very important work.

The authenticity judgment of the industrial pollutant emission and output value data is usually carried out according to artificial combination of a relevant model, experience and field investigation judgment, a large amount of manpower and material resources are consumed, and when the production environment changes (for example, the overall industrial output value is reduced due to epidemic situations), the traditional model has low robustness, so that the judgment is wrong. Therefore, the relevant knowledge of machine learning and big data mining can be adopted to solve the problem, and factory data is subjected to pre-screening judgment.

Disclosure of Invention

The invention aims to provide an automatic analysis and judgment method based on environmental protection big data so as to solve the problems in the background technology.

In order to solve the above technical problem, an object of the present invention is to provide an automatic analysis and discrimination method based on environmental big data, including the following steps:

s1, designing an automatic analysis and identification method, and sorting industrial output values and pollutant emission data;

s2, predicting and judging industrial output value data;

and S3, performing prediction and judgment on pollutant emission data.

As a further improvement of the present technical solution, in S1, the automatic analysis and identification method includes the following steps:

s1.1, removing obviously wrong data through an original environment-friendly big data database, and performing division and initial calculation according to a data source and a factory type;

s1.2, training a weak classifier and a data screening network according to the classified data and the initial calculation result, training a strong classifier through the screened data, performing weighted fusion on the classification results of the weak classifier and the strong classifier and the error of the actually reported data, and outputting confidence coefficient according to a threshold value;

s1.3, according to the judgment result of the S1.2, after data with an industrial output value of false are removed, a neural network prediction model is trained through the existing real data, the pollutant emission ratio of the year is predicted according to related pollutant emission data of three consecutive years, and confidence is output according to a threshold;

and S1.4, combining the judgment results of the S1.2 and the S1.3, and outputting the integrated judgment result.

In S1.1, the types of plants are classified, for example, steel plants are classified into one type, textile plants are classified into one type, and the like, because the same type of plants share the same external influence, for example, if the steel demand is reduced, all steel plants will correspondingly reduce the total output value, but the textile industry is not affected.

And on the basis, calculating the percentage of increase/decrease of the industrial output value for three consecutive years according to the real data verified in the past year as the input of the neural network, calculating the percentage of increase/decrease of the industrial output value reported in the present year as the real value of the output of the neural network, and training according to different types of plant data.

In S1.2, the neural network is simply trained according to the input and output and then used as a data screening network, because some data that is used as the real value of the neural network at this time may be reported falsely, after the neural network is simply trained (to prevent overfitting of false data), a batch of real data is screened out according to the error between the input and the output, and a neural network having the same structure as the data screening network is retrained as a strong classifier according to such data; meanwhile, original data are trained (the training iteration times are larger than those of the data screening network) to serve as weak classifiers, and the weak classifiers are used for avoiding discrimination errors of the strong classifiers caused by the loss of training data.

In S1.3, the data is respectively used as input, the error of the percentage increase/decrease of the total industrial output value reported this year is calculated according to the output result of the strong classifier and the output result of the weak classifier, the errors of the two are weighted and fused, and finally the authenticity of the data of the total industrial output value reported this year is determined according to the acceptable error threshold.

As a further improvement of the technical solution, in S1.1, the entropy algorithm of the information amount is adopted to remove the data obviously having errors, and the calculation formula is as follows:

H(x)＝-∑P(X_i)log₂P(X_i)；

where, i ═ 1,2, 3., n, Xi denote the i-th state (n states in total), p (Xi) represents the probability of the i-th state appearing, and h (x) is the amount of information required to remove uncertainty in bits (bit).

As a further improvement of the present technical solution, in S2, the method for predicting and determining the industrial production data includes the following steps:

s2.1, classifying the data types according to the reported plant types, calculating the increasing/decreasing percentage of the industrial output value for three consecutive years, and rejecting the obviously wrong data according to rules to finish the preprocessing of the data;

s2.2, mining the development trend of the factory in recent years through big data through a data screening network, and rejecting data which violates the overall development trend according to the overall trend;

s2.3, eliminating partial data with lower confidence coefficient through S2.2, and training a strong classifier on the basis of the residual data set, wherein the network structure and the loss function of the strong classifier are the same as a data screening network;

s2.4, training a weak classifier on the basis of the classified S2.1, wherein the network structure, the loss function and the input and output of data of the classifier are the same as those of the data screening network of the S2.1, and the difference is that the training iteration times are larger than those of the data screening network;

and S2.5, on the basis of the S2.3 and the S2.4, weighting the difference between the real value and the real value, calculating the final difference, and judging the authenticity of the data of the total industrial product according to an error threshold value.

In S2.4, the training of the weak classifier is to avoid losing part of the features of the screened data used for training the strong classifier.

As a further improvement of the technical scheme, in S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial production value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total production value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and the data with lower confidence coefficient of the error rejection part of the output and the real value is eliminated.

As a further improvement of the present technical solution, in S2.2, a computational expression of the MSE function is as follows:

as a further improvement of the present technical solution, in S2.5, a calculation expression for calculating a final difference by weighting the difference of the true values is as follows:

Error_Total＝λError_Strong+(1-λ)Error_Weak；

where λ is set to 0.2.

As a further improvement of the present invention, in S3, the method for predicting and determining pollutant discharge data includes the following steps:

s3.1, classifying the data types according to the reported pollutant types, and eliminating obviously wrong data according to rules to finish the data preprocessing, wherein the percentage of pollutant discharge volume/total industrial output value is the percentage of pollutant discharge volume;

s3.2, according to different categories, taking the percentage of pollutant discharge volume/industrial total output value in the previous three years as an input feature, taking the result in the fourth year as the output of a neural network, and constructing a complete feature;

s3.3, automatically extracting the correlation between the percentage of pollutant discharge volume/total industrial output value and the pollutants for three years through a convolutional neural network;

and S3.4, judging whether the data is real or not according to an error threshold value by calculating the error between the percentage of the predicted pollutant discharge volume/industrial total output value and the percentage of the real pollutant discharge volume/industrial total output value or calculating the error between the predicted pollutant discharge volume and the real pollutant discharge volume.

It should be noted that, in the determination of the authenticity of the reported pollutant emission, the pollutant emission is generally affected by the environmental protection facilities of the factory and the total industrial value, so that the determination of the authenticity of the data of the pollutant emission through the neural network is performed on the premise of determining the authenticity of the data of the total industrial value.

Specifically, pollutant emission data are simply sorted according to the types of reported factories, and data division of the part is different from division of industrial total output value judging data, but is divided according to the types of detected pollutants.

As a further improvement of this solution, in S3.2, when constructing the feature with the dimension n × 1 × 3, where n is the kind of pollutant, 1 is 1 column, and each feature of n × 1 is a year feature, the features of three years are superimposed to be used as the input of the neural network.

As a further improvement of this technical solution, in S3.3, the convolutional neural network has 3 convolutional layers and 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original dimension, the second fully-connected layer outputs the percentage of the predicted pollutant discharge volume/total industrial output value, and the loss function uses MSE.

Wherein, the correlation between the pollutant and the relation between the percentage of pollutant discharge volume/industry total output value between three years can be automatically extracted through a convolution neural network.

Specifically, in the testing stage, for pollutant emission data submitted by a target factory, the percentage of pollutant discharge volume/total industrial output value of three consecutive years is used as input, the percentage of the total industrial output value occupied by the industrial pollutant discharge volume of the year is predicted, the percentage is compared with the reported pollutant emission data, and authenticity judgment is carried out according to an acceptable error threshold.

Wherein, the error threshold value can be set and adjusted.

The invention also aims to provide an operating system of the automatic analysis and judgment method based on the environmental protection big data.

The invention also provides an operating system running device of the automatic analysis and judgment method based on the environmental protection big data, which comprises a processor, a memory and a computer program stored in the memory and running on the processor, wherein the processor is used for realizing any step of the automatic analysis and judgment method based on the environmental protection big data when executing the computer program.

The present invention also provides a computer program stored in the computer-readable storage medium, wherein the computer program, when executed by a processor, implements any of the steps of the above-described method for automatically analyzing and determining environmental-friendly big data.

Compared with the prior art, the invention has the beneficial effects that: in the automatic analysis and discrimination method based on the environmental protection big data, a basic flow of the automatic analysis and discrimination method is designed based on a deep learning network and a big data processing algorithm, and the authenticity discrimination is sequentially carried out on industrial output value data and pollutant emission data respectively; meanwhile, related information among total industrial production value data of consecutive years is extracted through a neural network, a model suitable for special influence is established according to the total industrial production value change of similar factories, and a data filter, a weak classifier and a strong classifier are trained respectively, so that the authenticity of the total industrial production value data reported in the current year is judged on the basis, and the prediction and judgment of the total industrial production value data can be realized; in addition, a pollutant emission data prediction model based on the convolutional neural network is trained, and the pollutant emission data can be predicted and distinguished.

Drawings

FIG. 1 is an overall flow diagram of the present invention;

FIG. 2 is an overall process flow diagram of the present invention;

FIG. 3 is a flow chart of a partial method of the present invention;

FIG. 4 is a partial flow block diagram of the present invention;

FIG. 5 is a flow chart of a partial method of the present invention;

FIG. 6 is a partial flow block diagram of the present invention;

FIG. 7 is a flow chart of a partial method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example 1

As shown in fig. 1 to 7, the present embodiment aims to provide an automatic analysis and discrimination method based on environmental big data, which includes the following steps:

s2, predicting and judging industrial output value data;

and S3, performing prediction and judgment on pollutant emission data.

In this embodiment, in S1, the automatic analysis and identification method includes the following steps:

In S1.1, the types of plants are classified, for example, steel plants are classified into one type, textile plants are classified into one type, and the like, because the same type of plants share the same external influence, for example, if the steel demand is reduced, all steel plants correspondingly reduce the total output value without affecting the textile industry.

In S1.2, simply training the neural network according to input and output and then using the neural network as a data screening network, wherein partial data serving as the data in the real value of the neural network at the moment is possibly reported falsely, so that after the neural network is simply trained (over-fitting false data is prevented), a batch of real data is screened out according to the error of input and output, and a neural network with the same structure as the data screening network is retrained as a strong classifier according to the data; meanwhile, original data are trained (the training iteration times are larger than those of the data screening network) to serve as weak classifiers, and the weak classifiers are used for avoiding discrimination errors of the strong classifiers caused by the loss of training data.

In S1.3, the data are respectively used as input, the error of the increase/decrease percentage of the total industrial output value reported this year is calculated according to the output result of the strong classifier and the output result of the weak classifier, the errors of the strong classifier and the weak classifier are weighted and fused, and finally the authenticity of the data of the total industrial output value reported this year is judged according to the acceptable error threshold.

Specifically, in S1.1, an entropy algorithm of information amount is adopted to remove data that is obviously erroneous, and a calculation formula is as follows:

H(x)＝-∑P(X_i)log₂P(X_i)；

In this embodiment, in S2, the method for predicting and determining the industrial output value data includes the following steps:

In S2.4, the weak classifier is trained to avoid losing part of features of the screened data used for training the strong classifier.

Further, in S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial output value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total output value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and data with lower confidence coefficient of the error rejection part of the output and real values is eliminated.

Specifically, in S2.2, the computational expression of the MSE function is as follows:

specifically, in S2.5, the final difference is calculated by weighting the difference of the true values as follows:

Error_Total＝λError_Strong+(1-λ)Error_Weak；

where λ is set to 0.2.

In this embodiment, in S3, the method for predicting and determining pollutant discharge data includes the following steps:

Further, in S3.2, when constructing a feature with dimension n × 1 × 3, where n is a pollutant species, 1 is 1 column, and each feature n × 1 is a year feature, the features of three years are added together as an input to the neural network.

Further, in S3.3, the convolutional neural network has 3 convolutional layers, 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original input characteristic dimension, the second fully-connected layer outputs the percentage of the predicted pollutant emission volume/total industrial value, and the loss function uses MSE.

Wherein, the error threshold value can be set and adjusted.

The embodiment also provides an operating system of the automatic analysis and judgment method based on the environmental protection big data.

The embodiment also provides an operating system running device based on the environmental protection big data automatic analysis and judgment method, and the operating system running device comprises a processor, a memory and a computer program which is stored in the memory and runs on the processor.

The processor comprises one or more than one processing core, the processor is connected with the processor through a bus, the memory is used for storing program instructions, and the automatic analysis and discrimination method based on the environmental protection big data is realized when the processor executes the program instructions in the memory.

Alternatively, the memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

In addition, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the steps of the automatic analysis and judgment method based on the environmental protection big data are realized.

Optionally, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the steps of the above-mentioned automatic analysis and discrimination method based on environmental big data.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by hardware related to instructions of a program, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The automatic analysis and discrimination method based on the environmental protection big data is characterized in that: the method comprises the following steps:

s2, predicting and judging industrial output value data;

and S3, performing prediction and judgment on pollutant emission data.

2. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S1, the automatic analysis and identification method includes the following steps:

3. The method for automatically analyzing and distinguishing the environmental big data according to claim 2, wherein: in the S1.1, the entropy algorithm of the information quantity is adopted to remove the data obviously having errors, and the calculation formula is as follows:

H(x)＝-∑P(X_i)log₂P(X_i)；

4. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S2, the method for predicting and determining the industrial production data includes the following steps:

5. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in the step S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial production value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total production value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and the data with low confidence coefficient is eliminated from the error between the output and the real value.

6. The method for automatically analyzing and distinguishing the environmental big data according to claim 5, wherein: in S2.2, the computational expression of the MSE function is as follows:

7. the method for automatically analyzing and distinguishing the environmental big data according to claim 4, wherein: in S2.5, the final difference is calculated by weighting the difference of the true values, and the calculation expression of the final difference is as follows:

Error_Total＝λError_Strong+(1-λ)Error_Weak；

where λ is set to 0.2.

8. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S3, the method for predicting and determining pollutant discharge data includes the following steps:

9. The method for automatically analyzing and distinguishing the environmental big data according to claim 8, wherein: in S3.2, when constructing a feature with a dimension of n × 1 × 3, where n is a pollutant species, 1 is 1 column, and each feature of n × 1 is a year feature, the features of three years are added together as an input to the neural network.

10. The method for automatically analyzing and distinguishing the environmental big data according to claim 8, wherein: in S3.3, the convolutional neural network has 3 convolutional layers and 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original dimension, the second fully-connected layer outputs the percentage of the predicted pollutant displacement/total industrial value, and the loss function uses MSE.