CN113159448A - Automatic analysis and discrimination method based on environmental protection big data - Google Patents

Automatic analysis and discrimination method based on environmental protection big data Download PDF

Info

Publication number
CN113159448A
CN113159448A CN202110516775.4A CN202110516775A CN113159448A CN 113159448 A CN113159448 A CN 113159448A CN 202110516775 A CN202110516775 A CN 202110516775A CN 113159448 A CN113159448 A CN 113159448A
Authority
CN
China
Prior art keywords
data
industrial
output value
percentage
pollutant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110516775.4A
Other languages
Chinese (zh)
Inventor
孙元晓
周轶文
刘军胜
司梦晨
王大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai Yinghui Intelligent Technology Co ltd
Original Assignee
Yantai Yinghui Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai Yinghui Intelligent Technology Co ltd filed Critical Yantai Yinghui Intelligent Technology Co ltd
Priority to CN202110516775.4A priority Critical patent/CN113159448A/en
Publication of CN113159448A publication Critical patent/CN113159448A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention relates to the technical field of deep learning networks, in particular to an automatic analysis and judgment method based on environmental protection big data. The method comprises the steps of designing an automatic analysis and identification method, sorting industrial output values and pollutant emission data, predicting and judging the industrial output value data, predicting and judging the pollutant emission data and the like. The design of the invention is based on deep learning network and big data processing algorithm to design the basic flow of the automatic analysis and identification method, and respectively carries out true and false discrimination on industrial output value data and pollutant emission data in sequence; meanwhile, relevant information among industrial total output value data of consecutive years is extracted through a neural network, a model suitable for special influence is established, and then a data screening device, a weak classifier and a strong classifier are trained respectively, so that prediction and discrimination of the industrial total output value data can be realized; in addition, a pollutant emission data prediction model based on the convolutional neural network is trained, and the pollutant emission data can be predicted and distinguished.

Description

Automatic analysis and discrimination method based on environmental protection big data
Technical Field
The invention relates to the technical field of deep learning networks, in particular to an automatic analysis and judgment method based on environmental protection big data.
Background
All departments in the industry can generate pollutants in the production process, and along with the continuous development of technology and the attention of society to ecological environment protection, the emission of pollutants is more strictly required, so that the detection of industrial pollution discharge data is particularly important. The traditional method is that the total production amount submitted by industrial departments and the numerical value of related pollutant discharge amount are manually judged, but part of the industrial departments conceal the misreported industrial pollution discharge data in order to reduce the environmental protection cost and improve the benefit, so that the related environmental protection departments are usually required to issue a large amount of manpower and material resources to check the authenticity of the pollution discharge data reported by factories.
With the continuous development of artificial intelligence and big data and other related technologies, intelligent judgment and prediction based on big data are also continuously mentioned. In the massive industrial pollutant emission and output value data, the proportion of the emission amount to the output value of the industrial pollutant is an important judgment index for judging whether the environmental protection of the factory reaches the standard or not, so that the authenticity judgment of the emission amount and the output value data of the industrial pollutant is very important work.
The authenticity judgment of the industrial pollutant emission and output value data is usually carried out according to artificial combination of a relevant model, experience and field investigation judgment, a large amount of manpower and material resources are consumed, and when the production environment changes (for example, the overall industrial output value is reduced due to epidemic situations), the traditional model has low robustness, so that the judgment is wrong. Therefore, the relevant knowledge of machine learning and big data mining can be adopted to solve the problem, and factory data is subjected to pre-screening judgment.
Disclosure of Invention
The invention aims to provide an automatic analysis and judgment method based on environmental protection big data so as to solve the problems in the background technology.
In order to solve the above technical problem, an object of the present invention is to provide an automatic analysis and discrimination method based on environmental big data, including the following steps:
s1, designing an automatic analysis and identification method, and sorting industrial output values and pollutant emission data;
s2, predicting and judging industrial output value data;
and S3, performing prediction and judgment on pollutant emission data.
As a further improvement of the present technical solution, in S1, the automatic analysis and identification method includes the following steps:
s1.1, removing obviously wrong data through an original environment-friendly big data database, and performing division and initial calculation according to a data source and a factory type;
s1.2, training a weak classifier and a data screening network according to the classified data and the initial calculation result, training a strong classifier through the screened data, performing weighted fusion on the classification results of the weak classifier and the strong classifier and the error of the actually reported data, and outputting confidence coefficient according to a threshold value;
s1.3, according to the judgment result of the S1.2, after data with an industrial output value of false are removed, a neural network prediction model is trained through the existing real data, the pollutant emission ratio of the year is predicted according to related pollutant emission data of three consecutive years, and confidence is output according to a threshold;
and S1.4, combining the judgment results of the S1.2 and the S1.3, and outputting the integrated judgment result.
In S1.1, the types of plants are classified, for example, steel plants are classified into one type, textile plants are classified into one type, and the like, because the same type of plants share the same external influence, for example, if the steel demand is reduced, all steel plants will correspondingly reduce the total output value, but the textile industry is not affected.
And on the basis, calculating the percentage of increase/decrease of the industrial output value for three consecutive years according to the real data verified in the past year as the input of the neural network, calculating the percentage of increase/decrease of the industrial output value reported in the present year as the real value of the output of the neural network, and training according to different types of plant data.
In S1.2, the neural network is simply trained according to the input and output and then used as a data screening network, because some data that is used as the real value of the neural network at this time may be reported falsely, after the neural network is simply trained (to prevent overfitting of false data), a batch of real data is screened out according to the error between the input and the output, and a neural network having the same structure as the data screening network is retrained as a strong classifier according to such data; meanwhile, original data are trained (the training iteration times are larger than those of the data screening network) to serve as weak classifiers, and the weak classifiers are used for avoiding discrimination errors of the strong classifiers caused by the loss of training data.
In S1.3, the data is respectively used as input, the error of the percentage increase/decrease of the total industrial output value reported this year is calculated according to the output result of the strong classifier and the output result of the weak classifier, the errors of the two are weighted and fused, and finally the authenticity of the data of the total industrial output value reported this year is determined according to the acceptable error threshold.
As a further improvement of the technical solution, in S1.1, the entropy algorithm of the information amount is adopted to remove the data obviously having errors, and the calculation formula is as follows:
H(x)=-∑P(Xi)log2P(Xi);
where, i ═ 1,2, 3., n, Xi denote the i-th state (n states in total), p (Xi) represents the probability of the i-th state appearing, and h (x) is the amount of information required to remove uncertainty in bits (bit).
As a further improvement of the present technical solution, in S2, the method for predicting and determining the industrial production data includes the following steps:
s2.1, classifying the data types according to the reported plant types, calculating the increasing/decreasing percentage of the industrial output value for three consecutive years, and rejecting the obviously wrong data according to rules to finish the preprocessing of the data;
s2.2, mining the development trend of the factory in recent years through big data through a data screening network, and rejecting data which violates the overall development trend according to the overall trend;
s2.3, eliminating partial data with lower confidence coefficient through S2.2, and training a strong classifier on the basis of the residual data set, wherein the network structure and the loss function of the strong classifier are the same as a data screening network;
s2.4, training a weak classifier on the basis of the classified S2.1, wherein the network structure, the loss function and the input and output of data of the classifier are the same as those of the data screening network of the S2.1, and the difference is that the training iteration times are larger than those of the data screening network;
and S2.5, on the basis of the S2.3 and the S2.4, weighting the difference between the real value and the real value, calculating the final difference, and judging the authenticity of the data of the total industrial product according to an error threshold value.
In S2.4, the training of the weak classifier is to avoid losing part of the features of the screened data used for training the strong classifier.
As a further improvement of the technical scheme, in S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial production value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total production value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and the data with lower confidence coefficient of the error rejection part of the output and the real value is eliminated.
As a further improvement of the present technical solution, in S2.2, a computational expression of the MSE function is as follows:
Figure BDA0003061809090000041
as a further improvement of the present technical solution, in S2.5, a calculation expression for calculating a final difference by weighting the difference of the true values is as follows:
ErrorTotal=λErrorStrong+(1-λ)ErrorWeak
where λ is set to 0.2.
As a further improvement of the present invention, in S3, the method for predicting and determining pollutant discharge data includes the following steps:
s3.1, classifying the data types according to the reported pollutant types, and eliminating obviously wrong data according to rules to finish the data preprocessing, wherein the percentage of pollutant discharge volume/total industrial output value is the percentage of pollutant discharge volume;
s3.2, according to different categories, taking the percentage of pollutant discharge volume/industrial total output value in the previous three years as an input feature, taking the result in the fourth year as the output of a neural network, and constructing a complete feature;
s3.3, automatically extracting the correlation between the percentage of pollutant discharge volume/total industrial output value and the pollutants for three years through a convolutional neural network;
and S3.4, judging whether the data is real or not according to an error threshold value by calculating the error between the percentage of the predicted pollutant discharge volume/industrial total output value and the percentage of the real pollutant discharge volume/industrial total output value or calculating the error between the predicted pollutant discharge volume and the real pollutant discharge volume.
It should be noted that, in the determination of the authenticity of the reported pollutant emission, the pollutant emission is generally affected by the environmental protection facilities of the factory and the total industrial value, so that the determination of the authenticity of the data of the pollutant emission through the neural network is performed on the premise of determining the authenticity of the data of the total industrial value.
Specifically, pollutant emission data are simply sorted according to the types of reported factories, and data division of the part is different from division of industrial total output value judging data, but is divided according to the types of detected pollutants.
As a further improvement of this solution, in S3.2, when constructing the feature with the dimension n × 1 × 3, where n is the kind of pollutant, 1 is 1 column, and each feature of n × 1 is a year feature, the features of three years are superimposed to be used as the input of the neural network.
As a further improvement of this technical solution, in S3.3, the convolutional neural network has 3 convolutional layers and 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original dimension, the second fully-connected layer outputs the percentage of the predicted pollutant discharge volume/total industrial output value, and the loss function uses MSE.
Wherein, the correlation between the pollutant and the relation between the percentage of pollutant discharge volume/industry total output value between three years can be automatically extracted through a convolution neural network.
Specifically, in the testing stage, for pollutant emission data submitted by a target factory, the percentage of pollutant discharge volume/total industrial output value of three consecutive years is used as input, the percentage of the total industrial output value occupied by the industrial pollutant discharge volume of the year is predicted, the percentage is compared with the reported pollutant emission data, and authenticity judgment is carried out according to an acceptable error threshold.
Wherein, the error threshold value can be set and adjusted.
The invention also aims to provide an operating system of the automatic analysis and judgment method based on the environmental protection big data.
The invention also provides an operating system running device of the automatic analysis and judgment method based on the environmental protection big data, which comprises a processor, a memory and a computer program stored in the memory and running on the processor, wherein the processor is used for realizing any step of the automatic analysis and judgment method based on the environmental protection big data when executing the computer program.
The present invention also provides a computer program stored in the computer-readable storage medium, wherein the computer program, when executed by a processor, implements any of the steps of the above-described method for automatically analyzing and determining environmental-friendly big data.
Compared with the prior art, the invention has the beneficial effects that: in the automatic analysis and discrimination method based on the environmental protection big data, a basic flow of the automatic analysis and discrimination method is designed based on a deep learning network and a big data processing algorithm, and the authenticity discrimination is sequentially carried out on industrial output value data and pollutant emission data respectively; meanwhile, related information among total industrial production value data of consecutive years is extracted through a neural network, a model suitable for special influence is established according to the total industrial production value change of similar factories, and a data filter, a weak classifier and a strong classifier are trained respectively, so that the authenticity of the total industrial production value data reported in the current year is judged on the basis, and the prediction and judgment of the total industrial production value data can be realized; in addition, a pollutant emission data prediction model based on the convolutional neural network is trained, and the pollutant emission data can be predicted and distinguished.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is an overall process flow diagram of the present invention;
FIG. 3 is a flow chart of a partial method of the present invention;
FIG. 4 is a partial flow block diagram of the present invention;
FIG. 5 is a flow chart of a partial method of the present invention;
FIG. 6 is a partial flow block diagram of the present invention;
FIG. 7 is a flow chart of a partial method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example 1
As shown in fig. 1 to 7, the present embodiment aims to provide an automatic analysis and discrimination method based on environmental big data, which includes the following steps:
s1, designing an automatic analysis and identification method, and sorting industrial output values and pollutant emission data;
s2, predicting and judging industrial output value data;
and S3, performing prediction and judgment on pollutant emission data.
In this embodiment, in S1, the automatic analysis and identification method includes the following steps:
s1.1, removing obviously wrong data through an original environment-friendly big data database, and performing division and initial calculation according to a data source and a factory type;
s1.2, training a weak classifier and a data screening network according to the classified data and the initial calculation result, training a strong classifier through the screened data, performing weighted fusion on the classification results of the weak classifier and the strong classifier and the error of the actually reported data, and outputting confidence coefficient according to a threshold value;
s1.3, according to the judgment result of the S1.2, after data with an industrial output value of false are removed, a neural network prediction model is trained through the existing real data, the pollutant emission ratio of the year is predicted according to related pollutant emission data of three consecutive years, and confidence is output according to a threshold;
and S1.4, combining the judgment results of the S1.2 and the S1.3, and outputting the integrated judgment result.
In S1.1, the types of plants are classified, for example, steel plants are classified into one type, textile plants are classified into one type, and the like, because the same type of plants share the same external influence, for example, if the steel demand is reduced, all steel plants correspondingly reduce the total output value without affecting the textile industry.
And on the basis, calculating the percentage of increase/decrease of the industrial output value for three consecutive years according to the real data verified in the past year as the input of the neural network, calculating the percentage of increase/decrease of the industrial output value reported in the present year as the real value of the output of the neural network, and training according to different types of plant data.
In S1.2, simply training the neural network according to input and output and then using the neural network as a data screening network, wherein partial data serving as the data in the real value of the neural network at the moment is possibly reported falsely, so that after the neural network is simply trained (over-fitting false data is prevented), a batch of real data is screened out according to the error of input and output, and a neural network with the same structure as the data screening network is retrained as a strong classifier according to the data; meanwhile, original data are trained (the training iteration times are larger than those of the data screening network) to serve as weak classifiers, and the weak classifiers are used for avoiding discrimination errors of the strong classifiers caused by the loss of training data.
In S1.3, the data are respectively used as input, the error of the increase/decrease percentage of the total industrial output value reported this year is calculated according to the output result of the strong classifier and the output result of the weak classifier, the errors of the strong classifier and the weak classifier are weighted and fused, and finally the authenticity of the data of the total industrial output value reported this year is judged according to the acceptable error threshold.
Specifically, in S1.1, an entropy algorithm of information amount is adopted to remove data that is obviously erroneous, and a calculation formula is as follows:
H(x)=-∑P(Xi)log2P(Xi);
where, i ═ 1,2, 3., n, Xi denote the i-th state (n states in total), p (Xi) represents the probability of the i-th state appearing, and h (x) is the amount of information required to remove uncertainty in bits (bit).
In this embodiment, in S2, the method for predicting and determining the industrial output value data includes the following steps:
s2.1, classifying the data types according to the reported plant types, calculating the increasing/decreasing percentage of the industrial output value for three consecutive years, and rejecting the obviously wrong data according to rules to finish the preprocessing of the data;
s2.2, mining the development trend of the factory in recent years through big data through a data screening network, and rejecting data which violates the overall development trend according to the overall trend;
s2.3, eliminating partial data with lower confidence coefficient through S2.2, and training a strong classifier on the basis of the residual data set, wherein the network structure and the loss function of the strong classifier are the same as a data screening network;
s2.4, training a weak classifier on the basis of the classified S2.1, wherein the network structure, the loss function and the input and output of data of the classifier are the same as those of the data screening network of the S2.1, and the difference is that the training iteration times are larger than those of the data screening network;
and S2.5, on the basis of the S2.3 and the S2.4, weighting the difference between the real value and the real value, calculating the final difference, and judging the authenticity of the data of the total industrial product according to an error threshold value.
In S2.4, the weak classifier is trained to avoid losing part of features of the screened data used for training the strong classifier.
Further, in S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial output value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total output value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and data with lower confidence coefficient of the error rejection part of the output and real values is eliminated.
Specifically, in S2.2, the computational expression of the MSE function is as follows:
Figure BDA0003061809090000091
specifically, in S2.5, the final difference is calculated by weighting the difference of the true values as follows:
ErrorTotal=λErrorStrong+(1-λ)ErrorWeak
where λ is set to 0.2.
In this embodiment, in S3, the method for predicting and determining pollutant discharge data includes the following steps:
s3.1, classifying the data types according to the reported pollutant types, and eliminating obviously wrong data according to rules to finish the data preprocessing, wherein the percentage of pollutant discharge volume/total industrial output value is the percentage of pollutant discharge volume;
s3.2, according to different categories, taking the percentage of pollutant discharge volume/industrial total output value in the previous three years as an input feature, taking the result in the fourth year as the output of a neural network, and constructing a complete feature;
s3.3, automatically extracting the correlation between the percentage of pollutant discharge volume/total industrial output value and the pollutants for three years through a convolutional neural network;
and S3.4, judging whether the data is real or not according to an error threshold value by calculating the error between the percentage of the predicted pollutant discharge volume/industrial total output value and the percentage of the real pollutant discharge volume/industrial total output value or calculating the error between the predicted pollutant discharge volume and the real pollutant discharge volume.
It should be noted that, in the determination of the authenticity of the reported pollutant emission, the pollutant emission is generally affected by the environmental protection facilities of the factory and the total industrial value, so that the determination of the authenticity of the data of the pollutant emission through the neural network is performed on the premise of determining the authenticity of the data of the total industrial value.
Specifically, pollutant emission data are simply sorted according to the types of reported factories, and data division of the part is different from division of industrial total output value judging data, but is divided according to the types of detected pollutants.
Further, in S3.2, when constructing a feature with dimension n × 1 × 3, where n is a pollutant species, 1 is 1 column, and each feature n × 1 is a year feature, the features of three years are added together as an input to the neural network.
Further, in S3.3, the convolutional neural network has 3 convolutional layers, 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original input characteristic dimension, the second fully-connected layer outputs the percentage of the predicted pollutant emission volume/total industrial value, and the loss function uses MSE.
Wherein, the correlation between the pollutant and the relation between the percentage of pollutant discharge volume/industry total output value between three years can be automatically extracted through a convolution neural network.
Specifically, in the testing stage, for pollutant emission data submitted by a target factory, the percentage of pollutant discharge volume/total industrial output value of three consecutive years is used as input, the percentage of the total industrial output value occupied by the industrial pollutant discharge volume of the year is predicted, the percentage is compared with the reported pollutant emission data, and authenticity judgment is carried out according to an acceptable error threshold.
Wherein, the error threshold value can be set and adjusted.
The embodiment also provides an operating system of the automatic analysis and judgment method based on the environmental protection big data.
The embodiment also provides an operating system running device based on the environmental protection big data automatic analysis and judgment method, and the operating system running device comprises a processor, a memory and a computer program which is stored in the memory and runs on the processor.
The processor comprises one or more than one processing core, the processor is connected with the processor through a bus, the memory is used for storing program instructions, and the automatic analysis and discrimination method based on the environmental protection big data is realized when the processor executes the program instructions in the memory.
Alternatively, the memory may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
In addition, the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the steps of the automatic analysis and judgment method based on the environmental protection big data are realized.
Optionally, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the steps of the above-mentioned automatic analysis and discrimination method based on environmental big data.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by hardware related to instructions of a program, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The automatic analysis and discrimination method based on the environmental protection big data is characterized in that: the method comprises the following steps:
s1, designing an automatic analysis and identification method, and sorting industrial output values and pollutant emission data;
s2, predicting and judging industrial output value data;
and S3, performing prediction and judgment on pollutant emission data.
2. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S1, the automatic analysis and identification method includes the following steps:
s1.1, removing obviously wrong data through an original environment-friendly big data database, and performing division and initial calculation according to a data source and a factory type;
s1.2, training a weak classifier and a data screening network according to the classified data and the initial calculation result, training a strong classifier through the screened data, performing weighted fusion on the classification results of the weak classifier and the strong classifier and the error of the actually reported data, and outputting confidence coefficient according to a threshold value;
s1.3, according to the judgment result of the S1.2, after data with an industrial output value of false are removed, a neural network prediction model is trained through the existing real data, the pollutant emission ratio of the year is predicted according to related pollutant emission data of three consecutive years, and confidence is output according to a threshold;
and S1.4, combining the judgment results of the S1.2 and the S1.3, and outputting the integrated judgment result.
3. The method for automatically analyzing and distinguishing the environmental big data according to claim 2, wherein: in the S1.1, the entropy algorithm of the information quantity is adopted to remove the data obviously having errors, and the calculation formula is as follows:
H(x)=-∑P(Xi)log2P(Xi);
where, i ═ 1,2, 3., n, Xi denote the i-th state (n states in total), p (Xi) represents the probability of the i-th state appearing, and h (x) is the amount of information required to remove uncertainty in bits (bit).
4. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S2, the method for predicting and determining the industrial production data includes the following steps:
s2.1, classifying the data types according to the reported plant types, calculating the increasing/decreasing percentage of the industrial output value for three consecutive years, and rejecting the obviously wrong data according to rules to finish the preprocessing of the data;
s2.2, mining the development trend of the factory in recent years through big data through a data screening network, and rejecting data which violates the overall development trend according to the overall trend;
s2.3, eliminating partial data with lower confidence coefficient through S2.2, and training a strong classifier on the basis of the residual data set, wherein the network structure and the loss function of the strong classifier are the same as a data screening network;
s2.4, training a weak classifier on the basis of the classified S2.1, wherein the network structure, the loss function and the input and output of data of the classifier are the same as those of the data screening network of the S2.1, and the difference is that the training iteration times are larger than those of the data screening network;
and S2.5, on the basis of the S2.3 and the S2.4, weighting the difference between the real value and the real value, calculating the final difference, and judging the authenticity of the data of the total industrial product according to an error threshold value.
5. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in the step S2.2, the data screening network is composed of 3 fully-connected layers, the input dimension is 3 × 1, the output is 1 × 1, the percentage of increase/decrease of the industrial production value of verified real data for three consecutive years is calculated as the input feature of the neural network, the percentage of increase/decrease of the industrial total production value reported this year is calculated as the output label of the neural network, simple pre-training is performed on different types of plant data, the loss function adopts MSE, the pre-trained network is the data screening network, and the data with low confidence coefficient is eliminated from the error between the output and the real value.
6. The method for automatically analyzing and distinguishing the environmental big data according to claim 5, wherein: in S2.2, the computational expression of the MSE function is as follows:
Figure FDA0003061809080000021
7. the method for automatically analyzing and distinguishing the environmental big data according to claim 4, wherein: in S2.5, the final difference is calculated by weighting the difference of the true values, and the calculation expression of the final difference is as follows:
ErrorTotal=λErrorStrong+(1-λ)ErrorWeak
where λ is set to 0.2.
8. The method for automatically analyzing and distinguishing the environmental big data according to claim 1, wherein: in S3, the method for predicting and determining pollutant discharge data includes the following steps:
s3.1, classifying the data types according to the reported pollutant types, and eliminating obviously wrong data according to rules to finish the data preprocessing, wherein the percentage of pollutant discharge volume/total industrial output value is the percentage of pollutant discharge volume;
s3.2, according to different categories, taking the percentage of pollutant discharge volume/industrial total output value in the previous three years as an input feature, taking the result in the fourth year as the output of a neural network, and constructing a complete feature;
s3.3, automatically extracting the correlation between the percentage of pollutant discharge volume/total industrial output value and the pollutants for three years through a convolutional neural network;
and S3.4, judging whether the data is real or not according to an error threshold value by calculating the error between the percentage of the predicted pollutant discharge volume/industrial total output value and the percentage of the real pollutant discharge volume/industrial total output value or calculating the error between the predicted pollutant discharge volume and the real pollutant discharge volume.
9. The method for automatically analyzing and distinguishing the environmental big data according to claim 8, wherein: in S3.2, when constructing a feature with a dimension of n × 1 × 3, where n is a pollutant species, 1 is 1 column, and each feature of n × 1 is a year feature, the features of three years are added together as an input to the neural network.
10. The method for automatically analyzing and distinguishing the environmental big data according to claim 8, wherein: in S3.3, the convolutional neural network has 3 convolutional layers and 2 fully-connected layers, the first convolutional layer uses a convolutional kernel with a parameter of 3 × 2 × 1 × 2, the second convolutional layer uses a convolutional layer with a parameter of 2 × 1 × 2, the third convolutional layer uses a convolutional layer with a parameter of 2 × 1, the first fully-connected layer changes the input characteristic dimension to half of the original dimension, the second fully-connected layer outputs the percentage of the predicted pollutant displacement/total industrial value, and the loss function uses MSE.
CN202110516775.4A 2021-05-12 2021-05-12 Automatic analysis and discrimination method based on environmental protection big data Pending CN113159448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110516775.4A CN113159448A (en) 2021-05-12 2021-05-12 Automatic analysis and discrimination method based on environmental protection big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110516775.4A CN113159448A (en) 2021-05-12 2021-05-12 Automatic analysis and discrimination method based on environmental protection big data

Publications (1)

Publication Number Publication Date
CN113159448A true CN113159448A (en) 2021-07-23

Family

ID=76874678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110516775.4A Pending CN113159448A (en) 2021-05-12 2021-05-12 Automatic analysis and discrimination method based on environmental protection big data

Country Status (1)

Country Link
CN (1) CN113159448A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555892A (en) * 2024-01-10 2024-02-13 江苏省生态环境大数据有限公司 Atmospheric pollutant multimode fusion accounting model post-treatment method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009003A (en) * 2007-01-09 2007-08-01 武汉理工大学 Water quality index prediction method used for municipal wastewater design
CN107370826A (en) * 2017-08-21 2017-11-21 北京盛世博创信息技术有限公司 Multi-source fusion builds the method, apparatus and system of environmentally friendly big data
CN109598712A (en) * 2018-11-30 2019-04-09 北京百度网讯科技有限公司 Quality determining method, device, server and the storage medium of plastic foam cutlery box
US20190251441A1 (en) * 2018-02-13 2019-08-15 Adobe Systems Incorporated Reducing architectural complexity of convolutional neural networks via channel pruning
CN112633900A (en) * 2020-12-16 2021-04-09 北京国电通网络技术有限公司 Industrial Internet of things data verification method based on machine learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009003A (en) * 2007-01-09 2007-08-01 武汉理工大学 Water quality index prediction method used for municipal wastewater design
CN107370826A (en) * 2017-08-21 2017-11-21 北京盛世博创信息技术有限公司 Multi-source fusion builds the method, apparatus and system of environmentally friendly big data
US20190251441A1 (en) * 2018-02-13 2019-08-15 Adobe Systems Incorporated Reducing architectural complexity of convolutional neural networks via channel pruning
CN109598712A (en) * 2018-11-30 2019-04-09 北京百度网讯科技有限公司 Quality determining method, device, server and the storage medium of plastic foam cutlery box
CN112633900A (en) * 2020-12-16 2021-04-09 北京国电通网络技术有限公司 Industrial Internet of things data verification method based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李秉文,杜梅: "甘培茨模型在工业废水排放量预测中的应用研究", 《东北水利水电》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117555892A (en) * 2024-01-10 2024-02-13 江苏省生态环境大数据有限公司 Atmospheric pollutant multimode fusion accounting model post-treatment method
CN117555892B (en) * 2024-01-10 2024-04-02 江苏省生态环境大数据有限公司 Atmospheric pollutant multimode fusion accounting model post-treatment method

Similar Documents

Publication Publication Date Title
CN110070141B (en) Network intrusion detection method
CN111882446A (en) Abnormal account detection method based on graph convolution network
Kadwe et al. A review on concept drift
CN111428733A (en) Zero sample target detection method and system based on semantic feature space conversion
CN113095927A (en) Method and device for identifying suspicious transactions of anti-money laundering
CN112491891B (en) Network attack detection method based on hybrid deep learning in Internet of things environment
CN115018348A (en) Environment analysis method, system, equipment and storage medium based on artificial intelligence
CN112115277A (en) Knowledge graph-based integrated circuit industrial chain identification method and system
Yandrapalli AI-Powered Data Governance: A Cutting-Edge Method for Ensuring Data Quality for Machine Learning Applications
CN113159448A (en) Automatic analysis and discrimination method based on environmental protection big data
CN114219531A (en) Waste mobile phone dynamic pricing method based on M-WU concept drift detection
Li et al. Autoencoder-based anomaly detection in streaming data with incremental learning and concept drift adaptation
Karimi Zandian et al. MEFUASN: a helpful method to extract features using analyzing social network for fraud detection
Durica et al. Financial distress prediction in Slovakia: An application of the CART algorithm.
CN114793170B (en) DNS tunnel detection method, system, equipment and terminal based on open set identification
CN115952076A (en) Code foreign odor identification method based on code semantics and measurement
CN109635008A (en) A kind of equipment fault detection method based on machine learning
CN113254939B (en) Intelligent contract vulnerability detection method based on multi-attention mechanism and self-adaptive learning
CN114579761A (en) Information security knowledge entity relation connection prediction method, system and medium
CN115049019A (en) Method and device for evaluating arsenic adsorption performance of metal organic framework and related equipment
CN114519605A (en) Advertisement click fraud detection method, system, server and storage medium
CN114553468A (en) Three-level network intrusion detection method based on feature intersection and ensemble learning
CN114186644A (en) Defect report severity prediction method based on optimized random forest
CN114581666A (en) Intelligent rock sample identification method and system based on image deep learning
CN114095268A (en) Method, terminal and storage medium for network intrusion detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210723

RJ01 Rejection of invention patent application after publication