CN111783904A - Data anomaly analysis method, device, equipment and medium based on environmental data - Google Patents

Data anomaly analysis method, device, equipment and medium based on environmental data Download PDF

Info

Publication number
CN111783904A
CN111783904A CN202010919414.XA CN202010919414A CN111783904A CN 111783904 A CN111783904 A CN 111783904A CN 202010919414 A CN202010919414 A CN 202010919414A CN 111783904 A CN111783904 A CN 111783904A
Authority
CN
China
Prior art keywords
abnormal
environmental data
decision tree
data
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010919414.XA
Other languages
Chinese (zh)
Other versions
CN111783904B (en
Inventor
张兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202010919414.XA priority Critical patent/CN111783904B/en
Publication of CN111783904A publication Critical patent/CN111783904A/en
Application granted granted Critical
Publication of CN111783904B publication Critical patent/CN111783904B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Tourism & Hospitality (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application relates to the field of machine learning in artificial intelligence, and discloses a data anomaly analysis method, a device, equipment and a medium based on environmental data, wherein the method comprises the following steps: acquiring environmental data to be analyzed and monitoring items; performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set; inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training; and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed. Therefore, the abnormity of the environmental data to be analyzed is reasonably analyzed.

Description

Data anomaly analysis method, device, equipment and medium based on environmental data
Technical Field
The present application relates to the field of machine learning in artificial intelligence, and in particular, to a method, an apparatus, a device, and a medium for analyzing data anomalies based on environmental data.
Background
The pollution of enterprises to the environment is always a difficult problem which puzzles every city, the government invests a large amount of manpower and material resources every year to supervise the pollution sources of the enterprises, various environment monitoring devices are installed in the enterprises to monitor the waste water and waste gas generated in the production process of the enterprises, the monitoring data are uploaded to a supervision platform in real time, a large amount of abnormity is generated in the process of collecting and uploading the monitoring data, for example, the transmission is unstable, the data are lost, the data are not changed for a long time, the data exceed the range, the data identification is abnormal, and the like.
Disclosure of Invention
The application mainly aims to provide a data anomaly analysis method, a data anomaly analysis device, data anomaly analysis equipment and a data anomaly analysis medium based on environmental data, and aims to solve the technical problem that the existing supervision platform cannot reasonably analyze received monitoring data anomalies.
In order to achieve the above object, the present application provides a data anomaly analysis method based on environmental data, the method including:
acquiring environmental data to be analyzed and monitoring items;
performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set;
inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training;
and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed.
Further, the step of acquiring the environmental data to be analyzed includes:
acquiring a data message monitored by environment monitoring equipment;
and analyzing according to the data message to obtain the environmental data to be analyzed.
Further, before the step of inputting the target abnormal feature set into an abnormal classification model for abnormal classification, the step of inputting the target abnormal feature set into the abnormal classification model based on decision tree training further includes:
obtaining a training sample set, the training sample set comprising a plurality of environmental data training samples, the environmental data training samples comprising: at least one abnormal characteristic sample value and an abnormal classification calibration value;
carrying out recursive division by adopting a CART algorithm (classification regression tree algorithm) according to the plurality of environmental data training samples to establish a CART decision tree;
carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub decision trees to be trained, wherein each sub decision tree to be trained corresponds to a target constant;
obtaining a verification sample set, wherein the verification sample set comprises a plurality of environment data verification samples;
and determining the abnormal classification model according to the plurality of environmental data verification samples and the plurality of sub-decision trees to be trained.
Further, the step of establishing the CART decision tree by performing recursive partitioning by using a CART algorithm according to the plurality of environmental data training samples includes:
selecting an independent variable Xi, determining Vi according to the independent variable Xi, and dividing an n-dimensional space into two parts, wherein all points of one part meet Xi less than or equal to Vi, all points of the other part meet Xi more than Vi, and for non-continuous variables, the abnormal features only take two values, and the values of the abnormal features include: yes or no;
and reselecting one abnormal feature from the two parts to continue to be divided, adopting a kini index as a division standard, stopping building the tree until a stopping condition is met, and using the built binary tree as the CART decision tree, wherein the stopping condition is as follows: the number of samples of the environmental data training samples of leaf nodes is 1 or the abnormal class belongs to the same class.
Further, the step of reselecting an abnormal feature from the two parts to continue dividing, using the kini index as a division standard, stopping building the tree until a stop condition is met, and using the built binary tree as the CART decision tree includes:
taking each part of the two parts as a node to be divided;
taking each abnormal feature of the nodes to be divided as an abnormal feature to be divided;
performing a Gini index calculation according to the value of the abnormal feature to be divided and all splitting points corresponding to the value of the abnormal feature to be divided to obtain a plurality of splitting Gini indexes;
determining an optimal splitting point according to the plurality of splitting kini indexes, and taking the splitting kini index corresponding to the optimal splitting point as the optimal splitting kini index;
determining the optimal abnormal features according to the optimal split-kini indexes of all the abnormal features to be divided;
generating two sub-nodes from the node to be divided according to the optimal abnormal feature;
dividing all the environment data verification samples of the node to be divided into the two child nodes according to the optimal abnormal feature and the optimal split point of the optimal abnormal feature;
and when the stopping condition is met, taking the established binary tree as the CART decision tree, otherwise, taking the two child nodes as the two parts, and executing the step of taking each part of the two parts as a node to be divided.
Further, the step of performing random pruning and constant determination on the CART decision tree to obtain a plurality of sub decision trees to be trained, where each sub decision tree to be trained corresponds to one target constant includes:
randomly pruning the CART decision tree to obtain a plurality of sub decision trees to be trained;
determining a constant of the sub-decision tree to be trained to obtain a target constant corresponding to the sub-decision tree to be trained;
pruning internal nodes t of the sub-decision tree to be trained, and taking t as a loss function of a single-node tree as
Figure 674664DEST_PATH_IMAGE001
C (t) is in a leaf nodeNumber of uncertainties, α is a variable;
pruned decisiontree T with T as root nodetHas a loss function of
Figure 654121DEST_PATH_IMAGE002
,C(Tt) Is the prediction error of the set of training samples, | TtIf is pruned decisiontree TtThe number of leaf nodes;
when T is the loss function of the single-node tree and T is the root node of the pruned decision tree TtWhen the loss functions of (a) and (b) are equal, the value of the variable α is taken as the target constant corresponding to the sub-decision tree to be trained.
Further, the step of performing random pruning and constant determination on the CART decision tree to obtain a plurality of sub decision trees to be trained, where each sub decision tree to be trained corresponds to one target constant, further includes:
carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained and a clipped environment data training sample set, wherein each sub-decision tree to be trained corresponds to a target constant; and the number of the first and second groups,
the obtaining a set of validation samples includes:
and removing the training sample set of the cut environmental data from the training sample set, and taking the rest part as the verification sample set.
The application also provides a data anomaly analysis device based on environmental data, the device includes:
the abnormal feature extraction module is used for acquiring environmental data to be analyzed and monitoring projects, and extracting abnormal features of the environmental data to be analyzed based on the monitoring projects to obtain a target abnormal feature set;
and the anomaly classification module is used for inputting the target anomaly feature set into an anomaly classification model for anomaly classification, the anomaly classification model is a model obtained based on decision tree training, and an anomaly classification result output by the anomaly classification model is obtained and used for expressing the anomaly category of the environmental data to be analyzed.
The present application also proposes a computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of any of the above-mentioned methods when executing the computer program.
The present application also proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of the above.
According to the data anomaly analysis method, the data anomaly analysis device, the data anomaly analysis equipment and the data anomaly analysis medium based on the environmental data, the target anomaly feature set is obtained by extracting the environmental data to be analyzed, the target anomaly feature set is input into the anomaly classification model for anomaly classification, the anomaly classification result output by the anomaly classification model is obtained, the anomaly of the environmental data to be analyzed is visually displayed through the anomaly classification result, the anomaly classification model is obtained based on decision tree training, the accuracy of the anomaly classification result is improved, and therefore the anomaly of the environmental data to be analyzed is reasonably analyzed.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for analyzing data anomalies based on environmental data according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a structure of an apparatus for analyzing data anomalies based on environmental data according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for analyzing data anomalies based on environmental data, where the method includes:
s1: acquiring environmental data to be analyzed and monitoring items;
s2: performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set;
s3: inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training;
s4: and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed.
The anomaly categories of the environmental data include, but are not limited to: equipment failure class, data exception class and data superstandard class.
The device failure class refers to environmental data abnormality caused by device failure, and the device failure includes but is not limited to environmental monitoring device failure, network failure, and environmental data receiving device failure. The abnormal features of the equipment failure class include, but are not limited to: unstable transmission, data loss. The transmission instability refers to an instability in the speed of transmission of the environmental data from the environmental monitoring apparatus to the environmental data receiving apparatus. Data loss refers to partial or complete environmental data loss.
The data exception class is an environmental data exception that results from the data itself not meeting expected requirements. The exception characteristics of the data exception class include, but are not limited to: data is unchanged for a long time, data exceeds the range, and data identification is abnormal. The data is not changed for a long time, which means that the time during which the value of the same type of data is not changed exceeds a preset time. The data overrange means that the environmental data exceeds a measurement range, for example, the PH value is 0-14, and the overrange is beyond 0-14. The data identification exception means that the value of the identification bit in the environment data is not within a preset identification range.
The data superscalar class is an environmental data anomaly resulting from a value exceeding an expected standard. The abnormal features of the data superscalar class include, but are not limited to: the conventional pollutants are slightly out of standard, the conventional pollutants are severely out of standard, and heavy metals and pollutants of the same type are out of standard. The conventional pollutant slightly exceeding the standard means that the data of the conventional pollutant in the environmental data exceeds the standard of the conventional pollutant slightly exceeding the standard and is lower than the standard of the conventional pollutant seriously exceeding the standard. The standard exceeding of the conventional pollutants seriously means that the data of the conventional pollutants in the environmental data is greater than or equal to the standard exceeding of the conventional pollutants seriously. The standard exceeding of the heavy metals and the pollutants means that the data of the heavy metals and the pollutants exceed the standard exceeding of the heavy metals and the pollutants.
For step S1, the environmental data to be analyzed may be obtained from a database, or may be sent by the environmental monitoring device in real time; and acquiring the monitoring items, wherein the monitoring items can be acquired from a database.
The environmental data to be analyzed is environmental data which needs to be subjected to abnormal classification. The environmental data is obtained from the environment monitored by the environmental monitoring device. The environmental data monitored by the environmental monitoring equipment can be sent to the environmental data receiving equipment by the environmental monitoring equipment in a data message mode.
The monitoring items include, but are not limited to, monitoring items for water and monitoring items for atmosphere.
For step S2, all the abnormal features of the environmental data to be analyzed are extracted, and at least one target abnormal feature is obtained. That is, the number of target anomaly features in the target anomaly feature set includes, but is not limited to, one, two, three, four, five. The target anomaly feature is also an anomaly feature, and the target anomaly feature comprises: exception feature name, value of exception feature. For example, if the name of the abnormal feature is long-term unchanged, the value of the abnormal feature includes: with or without change, for data that is not changed for a long time, "changed" may also be expressed as "no" and "unchanged" may also be expressed as "yes", which are not specifically limited by the examples herein.
The abnormal classification model based on the decision tree can select a training method from the prior art, can also obtain a training sample set, establishes a CART decision tree according to the training sample set, randomly prunes the CART decision tree to generate a plurality of sub-decision trees to be trained, verifies the plurality of sub-decision trees to be trained by using a verification sample set, and determines the abnormal classification model according to a verification result. The abnormal classification model is obtained through training based on the decision tree, so that the abnormal classification model has strong generalization capability.
Decision trees are a very common classification method, which is supervised learning; so-called supervised learning is to give a set of samples, each sample having a set of attributes and a class, which are determined in advance, and then learn to obtain a classifier, which can give the correct classification to the newly appeared object, and such machine learning is called supervised learning.
For step S3, all the target abnormal feature sets are input into an abnormal classification model for abnormal classification, and the abnormal classification model outputs an abnormal classification result. The anomaly classification result is an anomaly class of the environmental data.
For step S4, obtaining an anomaly classification result output by the anomaly classification model, and taking the anomaly classification result as a result of anomaly classification of the environmental data to be analyzed.
The exception classification result comprises the name of the exception category and the value of the exception category. For example, the names of the exception categories include a device failure category, a data exception category, and a data superscalar category, which are not specifically limited by the examples herein.
According to the method, the target abnormal feature set is obtained by extracting the environmental data to be analyzed, the target abnormal feature set is input into the abnormal classification model for abnormal classification, the abnormal classification result output by the abnormal classification model is obtained, the abnormal classification result shows the abnormal environmental data to be analyzed visually, the abnormal classification model is obtained by training based on the decision tree, the accuracy of the abnormal classification result is improved, and therefore the abnormal environmental data to be analyzed is reasonably analyzed.
In an embodiment, the step of obtaining the environmental data to be analyzed includes:
s11: acquiring a data message monitored by environment monitoring equipment;
s12: and analyzing according to the data message to obtain the environmental data to be analyzed.
For step S11, the environment monitoring device communicates with the environment data receiving device through the 212 communication protocol, packages the monitored environment data in a data message, and sends the data message to the environment data receiving device based on the 212 communication protocol. It is understood that the environment monitoring device and the environment data receiving device may also communicate with each other through other communication protocols, such as TCP (transmission control protocol), which is not specifically limited by the examples herein.
In step S12, the data packet is analyzed to obtain environmental data monitored by the environmental monitoring device, that is, the environmental data to be analyzed is the environmental data monitored by the environmental monitoring device. For example, the data message "# #0307ST =32, CN =2011, QN =20200714141100011, PW =123456, MN = WWSZ0003060095, CP = & DataTime =20200714141100, 060-Rtd =1.150,060-Flag = N, 011-Rtd =9.1,011-Flag = N, B01-Rtd =0.275, B01-Flag = N, B01TOTAL-Rtd =8462.000, B01TOTAL-Flag = N, 101-Rtd =0.0140,101-Flag = N, 029-Rtd =0.0070,029-Flag = N, 028-Rtd =0.0471,028-Flag = N, 001-Rtd =7.379,001-Flag = N & & C301", the data between two "&" is the environmental data of the environment monitoring device, the data to be analyzed for the data message is analyzed for "Rtd = 060 = 42-19", and the data between two "&" Rtd = 7342-0.0140,101-Flag = 31-fln "&" is analyzed for the environment data message "& = 011-463-stn = 011" & "= 060 = 7342, B01-Flag = N, B01TOTAL-Rtd =8462.000, B01TOTAL-Flag = N, 101-Rtd =0.0140,101-Flag = N, 029-Rtd =0.0070,029-Flag = N, 028-Rtd =0.0471,028-Flag = N, 001-Rtd =7.379,001-Flag = N ", which is not specifically limited by way of example.
In an embodiment, before the step of inputting the target abnormal feature set into an abnormal classification model for abnormal classification, the step of inputting the target abnormal feature set into an abnormal classification model based on decision tree training further includes:
s31: obtaining a training sample set, the training sample set comprising a plurality of environmental data training samples, the environmental data training samples comprising: at least one abnormal characteristic sample value and an abnormal classification calibration value;
s32: performing recursive division by adopting a CART algorithm according to the plurality of environmental data training samples to establish a CART decision tree;
s33: carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub decision trees to be trained, wherein each sub decision tree to be trained corresponds to a target constant;
s34: obtaining a verification sample set, wherein the verification sample set comprises a plurality of environment data verification samples;
s35: and determining the abnormal classification model according to the plurality of environmental data verification samples and the plurality of sub-decision trees to be trained.
For step S31, performing abnormal feature extraction on each historical environmental data to obtain an abnormal feature sample value corresponding to the historical environmental data; and calibrating each historical environment data to obtain an abnormal classification calibration value corresponding to the historical environment data.
The abnormal feature sample value comprises the name and the sample value of the abnormal feature. The abnormal classification calibration value comprises an abnormal class name and a calibration result.
It will be appreciated that there may be multiple anomaly classification calibrations, such as when the anomaly categories for the environmental data include: when the device fault class, the data abnormal class and the data standard exceeding class are three, the number of the abnormal classification calibration values is three, and the abnormal classification calibration values are respectively the device fault class calibration value, the data abnormal class calibration value and the data standard exceeding class calibration value, which is not specifically limited in this example.
The CART algorithm is a classification regression tree algorithm and is a binary recursive segmentation technology, a current sample is divided into two subsamples, each generated non-leaf node has two branches, and therefore a decision tree generated by the CART algorithm is a binary tree with a concise structure.
For step S32, the method specifically includes: and performing recursive division by adopting a CART algorithm according to all the environmental data training samples to establish a CART decision tree. That is, the CART decision tree is a binary tree with a compact structure.
For step S33, the method specifically includes: randomly pruning the CART decision tree, and taking the residual part of the CART decision tree after pruning as a sub-decision tree to be trained; and determining a constant for each sub-decision tree to be trained to obtain a target constant corresponding to the sub-decision tree to be trained.
With respect to step S34, it is understood that the verification sample set may be generated according to historical environmental data, or may be generated according to a training sample set. The environmental data training sample comprises: at least one anomaly characteristic sample value, an anomaly classification calibration value.
For step S35, the method specifically includes: performing a kini index calculation on the plurality of sub-decision trees to be trained according to the plurality of environment data verification samples to obtain a plurality of subtree kini indexes; and determining an optimal sub-decision tree according to the plurality of subtree kiney indexes, and taking the optimal sub-decision tree and a target constant corresponding to the optimal sub-decision tree as the abnormal classification model.
And inputting the plurality of environment data verification samples into one sub-decision tree to be trained for conducting the calculation of the kindness index to obtain a subtree kindness index.
And finding out the minimum value from the subtree kiney indexes, and taking the sub-decision tree to be trained corresponding to the found subtree kiney indexes as the optimal sub-decision tree.
It can be understood that, when the abnormal category increases, the abnormal classification model needs to be retrained in steps S31 to S35, and the retrained abnormal classification model is used for performing the abnormal classification.
S31 to S35, obtaining an abnormal classification model through training based on a decision tree, and enabling the abnormal classification model obtained through training to have strong generalization capability; the decision tree is prevented from being divided too finely by pruning, so that the abnormal classification model generates overfitting on noise data; the plurality of sub decision trees to be trained are obtained by random pruning, and then the optimal sub decision tree is determined from the plurality of sub decision trees to be trained, so that the accuracy of the found optimal sub decision tree is improved, and the accuracy of the abnormal classification model is improved.
In an embodiment, the step of building a CART decision tree by performing recursive partitioning with a CART algorithm according to the plurality of environmental data training samples includes:
s321: selecting an independent variable Xi, determining Vi according to the independent variable Xi, and dividing an n-dimensional space into two parts, wherein all points of one part meet Xi less than or equal to Vi, all points of the other part meet Xi more than Vi, and for non-continuous variables, the abnormal features only take two values, and the values of the abnormal features include: yes or no;
s322: and reselecting one abnormal feature from the two parts to continue to be divided, adopting a kini index as a division standard, stopping building the tree until a stopping condition is met, and using the built binary tree as the CART decision tree, wherein the stopping condition is as follows: the number of samples of the environmental data training samples of leaf nodes is 1 or the abnormal class belongs to the same class.
Through the step S321 and the step S322, the plurality of environment data training samples are divided into each leaf node of the binary tree, and the kini index is used as a division standard, so that each division is an optimal result, and the accuracy of the CART decision tree is improved.
For step S322, specifically, the method includes: and selecting one of the two parts by using a Gini index to obtain an optimal abnormal feature and an optimal split point, dividing one of the two parts according to the optimal abnormal feature and the optimal split point, performing recursive processing to complete the division of all nodes, and stopping building the tree until a stopping condition is met.
Wherein the number of samples of the environmental data training samples of the leaf node is 1 or the abnormal class belongs to the same class, including: the number of the environmental data training samples of the leaf node is 1, or the environmental data training samples of the leaf node belong to the same abnormal category.
In an embodiment, the step of reselecting one abnormal feature from the two parts to continue partitioning, using the kini index as a partitioning standard, stopping building the tree until a stopping condition is met, and using the built binary tree as the CART decision tree includes:
s3221: taking each part of the two parts as a node to be divided;
s3222: taking each abnormal feature of the nodes to be divided as an abnormal feature to be divided;
s3223: performing a Gini index calculation according to the value of the abnormal feature to be divided and all splitting points corresponding to the value of the abnormal feature to be divided to obtain a plurality of splitting Gini indexes;
s3224: determining an optimal splitting point according to the plurality of splitting kini indexes, and taking the splitting kini index corresponding to the optimal splitting point as the optimal splitting kini index;
s3225: determining the optimal abnormal features according to the optimal split-kini indexes of all the abnormal features to be divided;
s3226: generating two sub-nodes from the node to be divided according to the optimal abnormal feature;
s3227: dividing all the environment data verification samples of the node to be divided into the two child nodes according to the optimal abnormal feature and the optimal split point of the optimal abnormal feature;
s3228: and when the stopping condition is met, taking the established binary tree as the CART decision tree, otherwise, taking the two child nodes as the two parts, and executing the step of taking each part of the two parts as a node to be divided.
Through steps S3221 to S3228, it is realized that the CART decision tree is constructed by the CART algorithm, so that the obtained CART decision tree is a binary tree with the finest division.
For step S3221, specifically, the method includes: and taking each of the two parts as a node to be divided, namely, the number of the obtained nodes to be divided is two.
For step S3222, that is, each abnormal feature in the node to be partitioned is an abnormal feature to be partitioned. For example, the number of the abnormal features to be divided is 3 if the abnormal features in the nodes to be divided include 3, which is not specifically limited in this example.
For step S3223, specifically, the method includes: for an abnormal feature to be divided belonging to a variable, the splitting point is the middle point of the feature values of the abnormal feature to be divided of a pair of continuous variables. Assuming that one abnormal feature to be divided of m environmental data training samples has m continuous values, m-1 split points exist, each split point is the mean value of two adjacent continuous values, the division of each abnormal feature to be divided is sorted according to the reduction amount of impurities, and the split point with the largest reduction amount of impurities is used as the optimal split point. Wherein, the impurities are measured by a Giny index; the impurity reduction is defined as the sum of the impurity before division minus the impurity of each node after division multiplied by the sample ratio occupied by the division.
The calculation formula of the Gini index of the node A to be divided is as follows:
Figure 997640DEST_PATH_IMAGE004
wherein, PiIndicates the probability of belonging to the ith exception category, the number of C exception categories. All samples belong to the same class when gini (a) = 0; gini (A) is maximized when all classes occur with the same probability in the node, when Gini (A) has a value of (C-1) C/2.
Assuming that a node A to be divided is divided into B and C, wherein the proportion of B in the sample in A is p, C is q, and p + q is 1, the impurity reduction amount calculation formula is as follows: gini (A) - (p × Gini (B)) + q × Gini (C)).
For example, table 1 shows 6 training samples of environment data of a node a to be partitioned, where the abnormal features include: data is unchanged for a long time, data exceeds the range, data identification is abnormal, and the abnormal classification calibration value is a data abnormal type calibration value.
TABLE 16 environmental data training sample data Table
Figure DEST_PATH_IMAGE005
According to the environmental data training samples in table 1, as shown in table 2, the abnormal feature data are divided according to the long-term invariance, and the Gini index after division is calculated as follows:
table 2 statistical table of environmental data training samples according to abnormal characteristic data long-term invariance
Figure 251904DEST_PATH_IMAGE006
Gini(t1)=1-(2/2)2-(0/2)2=0, i.e. the kini index at which the anomaly characteristic data does not change over time;
Gini(t2)=1-(2/4)2-(2/4)2=0.5, i.e. the kini index at which the anomaly characteristic data does not change for a long period of time;
gini () = (2/6) × 0+ (4/6) × 0.5=0.333, that is, the amount of change in impurities.
According to the environmental data training samples in table 1, as shown in table 3, if the abnormal characteristic data is divided according to the overrange, the Gini index after division is calculated as follows:
TABLE 3 environmental data training sample statistics table according to abnormal characteristic data over-range
Figure DEST_PATH_IMAGE007
Gini(t1)=1-(1/3)2-(2/3)2=0.444, i.e. the data overrun is the kini index at the time of the overrun;
Gini(t2)=1-(3/3)2-(0/3)2=0, i.e. the data overrun is the kini index without overrun;
gini () = (3/6) × 0.444+ (3/6) × 0=0.222, that is, the amount of change in impurities.
For step S3224, specifically, the method includes: finding out the minimum value from the optimal splitting kiney indexes of all the abnormal features to be divided, and taking the abnormal features to be divided corresponding to the found optimal splitting kiney indexes as the optimal abnormal features.
Step S3225 specifically includes finding a minimum value from the optimal splitting kiney indexes of all the abnormal features to be classified, and using the abnormal feature to be classified corresponding to the found optimal splitting kiney index as the optimal abnormal feature.
For step S3228, when the stop condition is satisfied, taking the established binary tree as the CART decision tree; when the stop condition is not satisfied, steps S3221 to S3228 are repeatedly performed, thereby realizing recursive iteration.
In an embodiment, the step of performing random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained, where each sub-decision tree to be trained corresponds to a target constant includes:
s331: randomly pruning the CART decision tree to obtain a plurality of sub decision trees to be trained;
s332: determining a constant of the sub-decision tree to be trained to obtain a target constant corresponding to the sub-decision tree to be trained;
pruning internal nodes t of the sub-decision tree to be trained, and taking t as a loss function of a single-node tree as
Figure 925331DEST_PATH_IMAGE008
C (t) is the number of uncertainties in the leaf node, α is a variable;
pruned decisiontree T with T as root nodetHas a loss function of
Figure DEST_PATH_IMAGE009
,C(Tt) Is the prediction error of the set of training samples, | TtIf is pruned decisiontree TtThe number of leaf nodes;
when T is the loss function of the single-node tree and T is the root node of the pruned decision tree TtWhen the loss functions of (a) and (b) are equal, the value of the variable α is taken as the target constant corresponding to the sub-decision tree to be trained.
By pruning, the phenomenon that the abnormal classification model generates overfitting effect on noise data due to the fact that the decision tree is divided too finely is avoided; the plurality of sub decision trees to be trained are obtained by random pruning, and then the optimal sub decision tree is determined from the plurality of sub decision trees to be trained, so that the accuracy of the found optimal sub decision tree is improved, and the accuracy of the abnormal classification model is improved.
Where α =0 and α are sufficiently small, there is an inequality:
Figure 108312DEST_PATH_IMAGE010
when α is increased, there is a certain α
Figure DEST_PATH_IMAGE011
That is to say
Figure 234400DEST_PATH_IMAGE012
Then can derive
Figure 230038DEST_PATH_IMAGE014
(ii) a I.e. pruned decisiontree T with T as root nodetThe same loss function value as that of the single-node tree with T as the root node, the number of the nodes of T is small, and the sub-decision tree with T as the root node is used as the pruned sub-decision tree TtThe accuracy of the deciduous tree to be trained obtained after pruning is not influenced.
In an embodiment, the step of performing random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained, where each sub-decision tree to be trained corresponds to a target constant further includes:
carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained and a clipped environment data training sample set, wherein each sub-decision tree to be trained corresponds to a target constant; and the number of the first and second groups,
the obtaining a set of validation samples includes:
and removing the training sample set of the cut environmental data from the training sample set, and taking the rest part as the verification sample set.
By removing the training sample set of the cut environmental data from the training sample set, the rest part is used as the verification sample set, which is beneficial to improving the accuracy of verification of the verification sample set.
Wherein the pruned environment data training sample set is the environment data training samples in all the pruned leaf nodes.
In one embodiment, after the step of obtaining the anomaly classification result output by the anomaly classification model, the method further includes: and carrying out proportion calculation according to the abnormal classification result and the abnormal classification type to obtain the proportion corresponding to each abnormal type.
If the proportion corresponding to the abnormal equipment fault class is larger, the monitoring equipment is more abnormal, and the monitoring equipment or the data transmission channel needs to be monitored and corrected; if the proportion corresponding to the abnormal category data abnormal category is large, the monitoring data or the monitoring equipment is abnormal, and the monitoring equipment or the monitoring data needs to be monitored and corrected; if the proportion corresponding to the abnormal class data exceeding the standard is larger, indicating that the production pollution discharge of the enterprise exceeds the standard, the enterprise needs to be monitored and rectified.
As shown in fig. 2, in one embodiment, an apparatus for analyzing data abnormality based on environmental data is provided, the apparatus comprising:
the abnormal feature extraction module 10 is configured to acquire environmental data to be analyzed and monitoring items, and perform abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set;
and an anomaly classification module 20, configured to input the target anomaly feature set into an anomaly classification model for performing anomaly classification, where the anomaly classification model is a model obtained based on decision tree training, and obtains an anomaly classification result output by the anomaly classification model, and the anomaly classification result is used to express an anomaly category of the environmental data to be analyzed.
According to the method, the target abnormal feature set is obtained by extracting the environmental data to be analyzed, the target abnormal feature set is input into the abnormal classification model for abnormal classification, the abnormal classification result output by the abnormal classification model is obtained, the abnormal classification result shows the abnormal environmental data to be analyzed visually, the abnormal classification model is obtained by training based on the decision tree, the accuracy of the abnormal classification result is improved, and therefore the abnormal environmental data to be analyzed is reasonably analyzed.
In one embodiment, the abnormal feature extraction module further comprises: an environment data acquisition submodule;
the environment data acquisition submodule is used for acquiring data messages monitored by the environment monitoring equipment and analyzing the data messages to obtain the environment data to be analyzed.
In one embodiment, the apparatus further comprises:
a training sample determination module, configured to obtain a training sample set, where the training sample set includes a plurality of environmental data training samples, and the environmental data training samples include: at least one abnormal characteristic sample value and an abnormal classification calibration value;
the training module is used for training samples according to the plurality of environmental data, carrying out recursive division by adopting a CART algorithm to establish a CART decision tree, carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained, obtaining a verification sample set by corresponding each sub-decision tree to be trained to a target constant, wherein the verification sample set comprises a plurality of environmental data verification samples, and determining the abnormal classification model according to the plurality of environmental data verification samples and the plurality of sub-decision trees to be trained.
In one embodiment, the training module comprises: a CART decision tree construction submodule;
the CART decision tree construction submodule is used for selecting an independent variable Xi, determining Vi according to the independent variable Xi, dividing an n-dimensional space into two parts, wherein all points of one part meet Xi less than or equal to Vi, all points of the other part meet Xi more than Vi, and for discontinuous variables, the value of an abnormal feature is only two, and the value of the abnormal feature comprises: if not, reselecting one abnormal feature from the two parts to continue to be divided, adopting the kiney index as a division standard, stopping building the tree until a stopping condition is met, and taking the built binary tree as the CART decision tree, wherein the stopping condition is as follows: the number of samples of the environmental data training samples of leaf nodes is 1 or the abnormal class belongs to the same class.
In one embodiment, the CART decision tree construction sub-module comprises: an optimal abnormal feature determining unit and a CART decision tree determining unit;
the optimal abnormal feature determining unit is configured to use each of the two parts as a to-be-divided node, use each abnormal feature of the to-be-divided node as a to-be-divided abnormal feature, perform a kini index calculation according to a value of the to-be-divided abnormal feature and all splitting points corresponding to the value of the to-be-divided abnormal feature to obtain a plurality of splitting kini indexes, determine an optimal splitting point according to the plurality of splitting kini indexes, use the splitting kini index corresponding to the optimal splitting point as an optimal splitting kini index, and determine an optimal abnormal feature according to the optimal splitting kini indexes of all the to-be-divided abnormal features;
the CART decision tree determining unit is configured to generate two child nodes from the node to be partitioned according to the optimal abnormal feature, partition all the environment data verification samples of the node to be partitioned into the two child nodes according to the optimal abnormal feature and the optimal split point of the optimal abnormal feature, use the established binary tree as the CART decision tree when the stopping condition is met, and use the two child nodes as the two parts if the stopping condition is not met, and execute the step of using each of the two parts as one node to be partitioned.
In one embodiment, the training module comprises: determining a sub-module of the sub-decision tree to be trained;
the sub-decision tree to be trained determining sub-module is used for randomly pruning the CART decision tree to obtain a plurality of sub-decision trees to be trained, and determining constants of the sub-decision trees to be trained to obtain target constants corresponding to the sub-decision trees to be trained;
pruning internal nodes t of the sub-decision tree to be trained, and taking t as a loss function of a single-node tree as
Figure DEST_PATH_IMAGE015
C (t) is in the leaf nodeα is a variable;
pruned decisiontree T with T as root nodetHas a loss function of
Figure 790595DEST_PATH_IMAGE016
,C(Tt) Is the prediction error of the set of training samples, | TtIf is pruned decisiontree TtThe number of leaf nodes;
when T is the loss function of the single-node tree and T is the root node of the pruned decision tree TtWhen the loss functions of (a) and (b) are equal, the value of the variable α is taken as the target constant corresponding to the sub-decision tree to be trained.
In one embodiment, the training module further comprises: a verification sample set determination submodule;
the randomly pruning and constant determining the CART decision tree to obtain a plurality of sub decision trees to be trained, wherein each sub decision tree to be trained corresponds to a target constant, and the method further comprises the following steps: carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained and a clipped environment data training sample set, wherein each sub-decision tree to be trained corresponds to a target constant; and the number of the first and second groups,
the verification sample set determining submodule is used for eliminating the training sample set of the cut environmental data from the training sample set and taking the rest part as the verification sample set.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a data anomaly analysis method based on environmental data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of data anomaly analysis based on environmental data. The data anomaly analysis based on the environmental data comprises the following steps: acquiring environmental data to be analyzed and monitoring items; performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set; inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training; and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed.
According to the method, the target abnormal feature set is obtained by extracting the environmental data to be analyzed, the target abnormal feature set is input into the abnormal classification model for abnormal classification, the abnormal classification result output by the abnormal classification model is obtained, the abnormal classification result shows the abnormal environmental data to be analyzed visually, the abnormal classification model is obtained by training based on the decision tree, the accuracy of the abnormal classification result is improved, and therefore the abnormal environmental data to be analyzed is reasonably analyzed.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a method for data anomaly analysis based on environmental data, including the steps of: acquiring environmental data to be analyzed and monitoring items; performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set; inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training; and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed.
According to the method, the target abnormal feature set is obtained by extracting the environmental data to be analyzed, the target abnormal feature set is input into the abnormal classification model for abnormal classification, the abnormal classification result output by the abnormal classification model is obtained, the abnormal classification result shows the abnormal environmental data to be analyzed visually, the abnormal classification model is obtained by training based on the decision tree, the accuracy of the abnormal classification result is improved, and therefore the abnormal environmental data to be analyzed is reasonably analyzed.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for analyzing data anomalies based on environmental data, the method comprising:
acquiring environmental data to be analyzed and monitoring items;
performing abnormal feature extraction on the environmental data to be analyzed based on the monitoring items to obtain a target abnormal feature set;
inputting the target abnormal feature set into an abnormal classification model for abnormal classification, wherein the abnormal classification model is a model obtained based on decision tree training;
and acquiring an abnormal classification result output by the abnormal classification model, wherein the abnormal classification result is used for expressing the abnormal category of the environmental data to be analyzed.
2. The method for analyzing data abnormality based on environmental data according to claim 1, wherein the step of obtaining the environmental data to be analyzed includes:
acquiring a data message monitored by environment monitoring equipment;
and analyzing according to the data message to obtain the environmental data to be analyzed.
3. The method for analyzing data abnormality based on environmental data according to claim 1, wherein the step of inputting the target abnormality feature set into an abnormality classification model for abnormality classification, wherein the abnormality classification model is obtained based on decision tree training and further comprises:
obtaining a training sample set, the training sample set comprising a plurality of environmental data training samples, the environmental data training samples comprising: at least one abnormal characteristic sample value and an abnormal classification calibration value;
performing recursive division by adopting a CART algorithm according to the plurality of environmental data training samples to establish a CART decision tree;
carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub decision trees to be trained, wherein each sub decision tree to be trained corresponds to a target constant;
obtaining a verification sample set, wherein the verification sample set comprises a plurality of environment data verification samples;
and determining the abnormal classification model according to the plurality of environmental data verification samples and the plurality of sub-decision trees to be trained.
4. The method of claim 3, wherein the step of building the CART decision tree by recursive partitioning using the CART algorithm according to the plurality of environmental data training samples comprises:
selecting an independent variable Xi, determining Vi according to the independent variable Xi, and dividing an n-dimensional space into two parts, wherein all points of one part meet Xi less than or equal to Vi, all points of the other part meet Xi more than Vi, and for non-continuous variables, the abnormal features only take two values, and the values of the abnormal features include: yes or no;
and reselecting one abnormal feature from the two parts to continue to be divided, adopting a kini index as a division standard, stopping building the tree until a stopping condition is met, and using the built binary tree as the CART decision tree, wherein the stopping condition is as follows: the number of samples of the environmental data training samples of leaf nodes is 1 or the abnormal class belongs to the same class.
5. The method of claim 4, wherein the step of reselecting an abnormal feature from the two parts to continue dividing, using a kini index as a division criterion, stopping building the tree until a stop condition is met, and using the built binary tree as the CART decision tree comprises:
taking each part of the two parts as a node to be divided;
taking each abnormal feature of the nodes to be divided as an abnormal feature to be divided;
performing a Gini index calculation according to the value of the abnormal feature to be divided and all splitting points corresponding to the value of the abnormal feature to be divided to obtain a plurality of splitting Gini indexes;
determining an optimal splitting point according to the plurality of splitting kini indexes, and taking the splitting kini index corresponding to the optimal splitting point as the optimal splitting kini index;
determining the optimal abnormal features according to the optimal split-kini indexes of all the abnormal features to be divided;
generating two sub-nodes from the node to be divided according to the optimal abnormal feature;
dividing all the environment data verification samples of the node to be divided into the two child nodes according to the optimal abnormal feature and the optimal split point of the optimal abnormal feature;
and when the stopping condition is met, taking the established binary tree as the CART decision tree, otherwise, taking the two child nodes as the two parts, and executing the step of taking each part of the two parts as a node to be divided.
6. The method of claim 3, wherein the step of randomly pruning and constant-determining the CART decision tree to obtain a plurality of sub-decision trees to be trained, each sub-decision tree to be trained corresponding to a target constant comprises:
randomly pruning the CART decision tree to obtain a plurality of sub decision trees to be trained;
determining a constant of the sub-decision tree to be trained to obtain a target constant corresponding to the sub-decision tree to be trained;
pruning internal nodes t of the sub-decision tree to be trained, and taking t as a loss function of a single-node tree as
Figure DEST_PATH_IMAGE002
C (t) is the number of uncertainties in the leaf node, α is a variable;
pruned decisiontree T with T as root nodetHas a loss function of
Figure DEST_PATH_IMAGE004
,C(Tt) Is the prediction error of the set of training samples, | TtIf is pruned decisiontree TtThe number of leaf nodes;
when T is the loss function of the single-node tree and T is the root node of the pruned decision tree TtWhen the loss functions of (a) and (b) are equal, the value of the variable α is taken as the target constant corresponding to the sub-decision tree to be trained.
7. The method of claim 3, wherein the step of randomly pruning and constant-determining the CART decision tree to obtain a plurality of sub-decision trees to be trained, each sub-decision tree to be trained corresponding to a target constant, further comprises:
carrying out random pruning and constant determination on the CART decision tree to obtain a plurality of sub-decision trees to be trained and a clipped environment data training sample set, wherein each sub-decision tree to be trained corresponds to a target constant; and the number of the first and second groups,
the obtaining a set of validation samples includes:
and removing the training sample set of the cut environmental data from the training sample set, and taking the rest part as the verification sample set.
8. An apparatus for analyzing data abnormality based on environmental data, the apparatus comprising:
the abnormal feature extraction module is used for acquiring environmental data to be analyzed and monitoring projects, and extracting abnormal features of the environmental data to be analyzed based on the monitoring projects to obtain a target abnormal feature set;
and the anomaly classification module is used for inputting the target anomaly feature set into an anomaly classification model for anomaly classification, the anomaly classification model is a model obtained based on decision tree training, and an anomaly classification result output by the anomaly classification model is obtained and used for expressing the anomaly category of the environmental data to be analyzed.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1to 7.
CN202010919414.XA 2020-09-04 2020-09-04 Data anomaly analysis method, device, equipment and medium based on environmental data Active CN111783904B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010919414.XA CN111783904B (en) 2020-09-04 2020-09-04 Data anomaly analysis method, device, equipment and medium based on environmental data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010919414.XA CN111783904B (en) 2020-09-04 2020-09-04 Data anomaly analysis method, device, equipment and medium based on environmental data

Publications (2)

Publication Number Publication Date
CN111783904A true CN111783904A (en) 2020-10-16
CN111783904B CN111783904B (en) 2020-12-04

Family

ID=72762144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010919414.XA Active CN111783904B (en) 2020-09-04 2020-09-04 Data anomaly analysis method, device, equipment and medium based on environmental data

Country Status (1)

Country Link
CN (1) CN111783904B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112344990A (en) * 2020-10-21 2021-02-09 平安国际智慧城市科技股份有限公司 Environmental anomaly monitoring method, device, equipment and storage medium
CN112487033A (en) * 2020-11-30 2021-03-12 国网山东省电力公司电力科学研究院 Service visualization method and system for data flow and network topology construction
CN113205274A (en) * 2021-05-21 2021-08-03 华设设计集团股份有限公司 Quantitative ranking method for construction quality
CN113344417A (en) * 2021-06-23 2021-09-03 武汉虹信技术服务有限责任公司 Method, system, computer equipment and readable medium for checking houses of individual workshops in residential building
CN113435517A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Abnormal data point output method and device, computer equipment and storage medium
CN114066438A (en) * 2021-11-15 2022-02-18 平安证券股份有限公司 Model-based monitoring data display method, device, equipment and storage medium
CN114118306A (en) * 2022-01-26 2022-03-01 北京普利莱基因技术有限公司 Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent
CN114227378A (en) * 2021-11-17 2022-03-25 航天科工深圳(集团)有限公司 Clamp state detection method and device, terminal and storage medium
CN116910667A (en) * 2023-09-08 2023-10-20 中国铁塔股份有限公司吉林省分公司 Communication tower abnormal state analysis method and system based on decision algorithm
CN117434227A (en) * 2023-12-20 2024-01-23 河北金隅鼎鑫水泥有限公司 Method and system for monitoring waste gas components of cement manufacturing plant

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061322A1 (en) * 2015-08-31 2017-03-02 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN106872657A (en) * 2017-01-05 2017-06-20 河海大学 A kind of multivariable water quality parameter time series data accident detection method
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN108830765A (en) * 2018-04-18 2018-11-16 中国地质大学(武汉) A kind of checking method and system of pollution entering the water monitoring data
CN108921440A (en) * 2018-07-11 2018-11-30 平安科技(深圳)有限公司 Pollutant method for monitoring abnormality, system, computer equipment and storage medium
CN109657616A (en) * 2018-12-19 2019-04-19 四川立维空间信息技术有限公司 A kind of remote sensing image land cover pattern automatic classification method
CN109948669A (en) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method and device
CN110413682A (en) * 2019-08-09 2019-11-05 云南电网有限责任公司 A kind of the classification methods of exhibiting and system of data
CN110569867A (en) * 2019-07-15 2019-12-13 山东电工电气集团有限公司 Decision tree algorithm-based power transmission line fault reason distinguishing method, medium and equipment
CN110766059A (en) * 2019-10-14 2020-02-07 四川西部能源股份有限公司郫县水电厂 Transformer fault prediction method, device and equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061322A1 (en) * 2015-08-31 2017-03-02 International Business Machines Corporation Automatic generation of training data for anomaly detection using other user's data samples
CN106872657A (en) * 2017-01-05 2017-06-20 河海大学 A kind of multivariable water quality parameter time series data accident detection method
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN108830765A (en) * 2018-04-18 2018-11-16 中国地质大学(武汉) A kind of checking method and system of pollution entering the water monitoring data
CN108921440A (en) * 2018-07-11 2018-11-30 平安科技(深圳)有限公司 Pollutant method for monitoring abnormality, system, computer equipment and storage medium
CN109657616A (en) * 2018-12-19 2019-04-19 四川立维空间信息技术有限公司 A kind of remote sensing image land cover pattern automatic classification method
CN109948669A (en) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 A kind of abnormal deviation data examination method and device
CN110569867A (en) * 2019-07-15 2019-12-13 山东电工电气集团有限公司 Decision tree algorithm-based power transmission line fault reason distinguishing method, medium and equipment
CN110413682A (en) * 2019-08-09 2019-11-05 云南电网有限责任公司 A kind of the classification methods of exhibiting and system of data
CN110766059A (en) * 2019-10-14 2020-02-07 四川西部能源股份有限公司郫县水电厂 Transformer fault prediction method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIE LIU ET AL: "An integrated data-driven framework for surface water quality anomaly detection and early warning", 《JOURNAL OF CLEANER PRODUCTION》 *
李苍柏 等: "支持向量机、随机森林和人工神经网络机器学习", 《地球学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112344990A (en) * 2020-10-21 2021-02-09 平安国际智慧城市科技股份有限公司 Environmental anomaly monitoring method, device, equipment and storage medium
CN112344990B (en) * 2020-10-21 2024-05-31 平安国际智慧城市科技股份有限公司 Environment anomaly monitoring method, device, equipment and storage medium
CN112487033A (en) * 2020-11-30 2021-03-12 国网山东省电力公司电力科学研究院 Service visualization method and system for data flow and network topology construction
CN113205274A (en) * 2021-05-21 2021-08-03 华设设计集团股份有限公司 Quantitative ranking method for construction quality
CN113344417A (en) * 2021-06-23 2021-09-03 武汉虹信技术服务有限责任公司 Method, system, computer equipment and readable medium for checking houses of individual workshops in residential building
CN113435517B (en) * 2021-06-29 2023-06-02 平安科技(深圳)有限公司 Abnormal data point output method, device, computer equipment and storage medium
CN113435517A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Abnormal data point output method and device, computer equipment and storage medium
CN114066438A (en) * 2021-11-15 2022-02-18 平安证券股份有限公司 Model-based monitoring data display method, device, equipment and storage medium
CN114227378A (en) * 2021-11-17 2022-03-25 航天科工深圳(集团)有限公司 Clamp state detection method and device, terminal and storage medium
CN114118306A (en) * 2022-01-26 2022-03-01 北京普利莱基因技术有限公司 Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent
CN114118306B (en) * 2022-01-26 2022-04-01 北京普利莱基因技术有限公司 Method and device for analyzing SDS (sodium dodecyl sulfate) gel electrophoresis experimental data and SDS gel reagent
CN116910667A (en) * 2023-09-08 2023-10-20 中国铁塔股份有限公司吉林省分公司 Communication tower abnormal state analysis method and system based on decision algorithm
CN116910667B (en) * 2023-09-08 2023-11-21 中国铁塔股份有限公司吉林省分公司 Communication tower abnormal state analysis method and system based on decision algorithm
CN117434227A (en) * 2023-12-20 2024-01-23 河北金隅鼎鑫水泥有限公司 Method and system for monitoring waste gas components of cement manufacturing plant
CN117434227B (en) * 2023-12-20 2024-04-30 河北金隅鼎鑫水泥有限公司 Method and system for monitoring waste gas components of cement manufacturing plant

Also Published As

Publication number Publication date
CN111783904B (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN111783904B (en) Data anomaly analysis method, device, equipment and medium based on environmental data
CN110609759B (en) Fault root cause analysis method and device
US11301759B2 (en) Detective method and system for activity-or-behavior model construction and automatic detection of the abnormal activities or behaviors of a subject system without requiring prior domain knowledge
Gokhale et al. Regression tree modeling for the prediction of software quality
CN111885059B (en) Method for detecting and positioning abnormal industrial network flow
CN111309539A (en) Abnormity monitoring method and device and electronic equipment
CN114124482B (en) Access flow anomaly detection method and equipment based on LOF and isolated forest
CN111666276A (en) Method for eliminating abnormal data by applying isolated forest algorithm in power load prediction
CN113093985B (en) Sensor data link abnormity detection method and device and computer equipment
CN110188196B (en) Random forest based text increment dimension reduction method
CN115511398A (en) Welding quality intelligent detection method and system based on time sensitive network
CN113284004A (en) Power data diagnosis treatment method based on isolated forest algorithm
CN115622867A (en) Industrial control system safety event early warning classification method and system
CN111738530B (en) River water quality prediction method, device and computer readable storage medium
CN113489606A (en) Network application identification method and device based on graph neural network
CN115422263B (en) Multifunctional universal fault analysis method and system for electric power field
CN116907764A (en) Method, device, equipment and storage medium for detecting air tightness of desulfurization equipment
CN116126807A (en) Log analysis method and related device
CN116030955A (en) Medical equipment state monitoring method and related device based on Internet of things
CN113726756A (en) Web abnormal traffic detection method, device, equipment and storage medium
WO2020133470A1 (en) Chat corpus cleaning method and apparatus, computer device, and storage medium
CN112069037A (en) Method and device for detecting no threshold value of cloud platform
CN113676457B (en) Streaming type multilayer security detection method and system based on state machine
CN116094955B (en) Operation and maintenance fault chain labeling system and method based on self-evolution network knowledge base
Viademonte et al. Discovering knowledge from meteorological databases: a meteorological aviation forecast study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Data anomaly analysis method, device, equipment and medium based on environmental data

Effective date of registration: 20210429

Granted publication date: 20201204

Pledgee: Shenzhen hi tech investment small loan Co.,Ltd.

Pledgor: Ping An International Smart City Technology Co.,Ltd.

Registration number: Y2021980003211

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230206

Granted publication date: 20201204

Pledgee: Shenzhen hi tech investment small loan Co.,Ltd.

Pledgor: Ping An International Smart City Technology Co.,Ltd.

Registration number: Y2021980003211

PC01 Cancellation of the registration of the contract for pledge of patent right