CN114418006A - Abnormal data detection method and device - Google Patents

Abnormal data detection method and device Download PDF

Info

Publication number
CN114418006A
CN114418006A CN202210071382.1A CN202210071382A CN114418006A CN 114418006 A CN114418006 A CN 114418006A CN 202210071382 A CN202210071382 A CN 202210071382A CN 114418006 A CN114418006 A CN 114418006A
Authority
CN
China
Prior art keywords
data
support degree
support
power data
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210071382.1A
Other languages
Chinese (zh)
Inventor
梅发茂
黄浩
付佳佳
马腾腾
刘晓燕
鲍远义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN202210071382.1A priority Critical patent/CN114418006A/en
Publication of CN114418006A publication Critical patent/CN114418006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23211Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with adaptive number of clusters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method and a device for detecting abnormal data, wherein the detection method comprises the following steps: obtaining a power data set, the power data set comprising a plurality of power data; calculating the support degree between the electric power data and generating a support degree matrix; clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters; and comparing the sub-clusters to detect abnormal data to obtain an abnormal detection result of the power data. According to the method and the device, the sum of the support degrees of each row of data in the support degree matrix is used as a sub-cluster center, the cluster center does not need to be manually established in advance, and the threshold value and the density definition do not need to be determined in advance. According to the method and the device, the clustering weight can be adaptively adjusted according to the support sum calculated by the individual model, the requirement of the clustering process on the data structure is reduced, so that the method and the device can be suitable for more different data, and the data adaptability of the abnormal data detection method is improved.

Description

Abnormal data detection method and device
Technical Field
The application relates to the field of power grid safety, in particular to a method and a device for detecting abnormal data.
Background
The power grid power system collects power observation data through a monitoring and data acquisition system to perform state estimation. However, in the long-term operation of the power grid power system, the power observation data may be tampered by the false data injection attack, so that the state estimation result of the power system is deviated, and the real benefit of the power grid is seriously damaged.
At present, abnormal data is mainly detected by a single physical space security protection technology or an information space abnormality detection technology such as a distance clustering algorithm or a density distance clustering algorithm. However, the traditional distance clustering algorithm needs to manually make the clustering center in advance, and the traditional density distance clustering algorithm needs to determine the threshold and the density definition in advance, so that the requirement of hierarchical clustering on the data shape structure is too high. Therefore, the current abnormal data detection mode has the problem of poor data adaptability.
Disclosure of Invention
The application provides a method and a device for detecting abnormal data, and aims to solve the problem that the existing abnormal data detection method is poor in data adaptability.
In order to solve the foregoing technical problem, in a first aspect, an embodiment of the present application provides a method for detecting abnormal data, including:
obtaining a power data set, the power data set comprising a plurality of power data;
calculating the support degree between the electric power data and generating a support degree matrix;
clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters;
and comparing the sub-clusters to detect abnormal data to obtain an abnormal detection result of the power data.
In the embodiment, the incidence relation between the data is presented by calculating the support degree between the electric power data and generating a support degree matrix; and clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters, comparing the sub-clusters, and detecting abnormal data to obtain an abnormal detection result of the power data. Different from the traditional distance clustering algorithm and the traditional density distance clustering algorithm, the support sum of each row of data in the support matrix is used as a sub-cluster center, the cluster center does not need to be manually established in advance, and the threshold and the density definition do not need to be determined in advance. According to the method and the device, the clustering weight can be adaptively adjusted according to the support sum calculated by the individual model, the requirement of the clustering process on the data structure is reduced, so that the method and the device can be suitable for more different data, and the data adaptability of the abnormal data detection method is improved.
In one embodiment, the acquiring the power data set includes:
acquiring electric power observation data, wherein the electric power observation data comprises numerical data and text data;
and carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, wherein the electric power data form the electric power data set.
The embodiment uniformly converts the electric power observation data into the electric power data through numerical value normalization to realize the standardization of the electric power observation data, thereby facilitating the subsequent data operation and improving the data operation efficiency.
In one embodiment, the calculating the support degree between the power data and generating a support degree matrix includes:
calculating Euclidean distances between the power data based on the power data;
calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data;
generating the support matrix according to the support among all the electric power data;
wherein, the support formula is as follows:
Sup=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure BDA0003482204960000021
wherein Sup is support degree, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
In the embodiment, the euclidean distance between the power data is calculated to obtain the similarity between the power data, the support degree is determined based on the euclidean distance, and all the support degrees are constructed into the support degree matrix, so that the incidence relation between the power data is obtained, the support degree matrix with the incidence relation of the power data can be utilized for clustering, and the data adaptability is improved.
In an embodiment, the clustering the support degrees in the support degree matrix according to a sum of the support degrees of each row of data in the support degree matrix to obtain a plurality of sub-clusters includes:
step one, calculating the sum of the support degree of each row of data in the support degree matrix, and adding the sum of the support degree to the corresponding row of the support degree matrix;
determining the maximum support degree sum of the support degree sums, and determining the maximum support degree in each line of data;
taking column data corresponding to the target maximum support degree as the same sub-cluster to obtain a sub-cluster, wherein the target maximum support degree is the maximum support degree on the same row with the sum of the maximum support degrees;
deleting the column data corresponding to the target maximum support degree in the support degree matrix to obtain a new support degree matrix;
and repeating the first step to the fourth step until a preset iteration stop condition is reached to obtain a plurality of sub-clusters.
In this embodiment, the sum of the support degrees of each row of data in the support degree matrix is calculated, the sum of the support degrees is used as the center point of the sub-cluster of the current iteration, the maximum support degree in each row of support degrees is compared with the sum of the support degrees, and if the sum of the maximum support degree and the support degree is in the same row, it is determined that the maximum support degree is near the center point of the sub-cluster, so that all the maximum support degrees near the center point of the sub-cluster of the current iteration are used as one sub-cluster, and the iteration process is repeated to obtain a plurality of sub-clusters. Compared with the traditional distance clustering algorithm and the traditional density distance clustering algorithm, the method has the advantages that the clustering center does not need to be manually established in advance, the threshold value and the density definition do not need to be determined in advance, the requirement of the clustering process on the data structure is reduced, and the data adaptability of the clustering process is improved.
In an embodiment, before clustering the support degrees in the support degree matrix according to a sum of the support degrees of each row of data in the support degree matrix to obtain a plurality of sub-clusters, the method further includes:
and carrying out lower triangular numeralization processing on the support matrix to obtain the support matrix which is an upper triangular reciprocal value.
In this embodiment, the support matrix is subjected to lower triangular numerical processing, so that an iteration result of each iteration process is more easily solved during clustering, and the iteration efficiency of the clustering process is improved.
In a second aspect, an embodiment of the present application provides an apparatus for detecting abnormal data, including:
an acquisition module to acquire a power data set, the power data set including a plurality of power data;
the calculation module is used for calculating the support degree between the electric power data and generating a support degree matrix;
the clustering module is used for clustering the support degree in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters;
and the detection module is used for comparing the plurality of sub-clusters and detecting abnormal data to obtain an abnormal detection result of the power data.
In one embodiment, the obtaining module includes:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring electric power observation data which comprise numerical data and text data;
and the normalization unit is used for carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, and the plurality of electric power data form the electric power data set.
In one embodiment, the calculation module includes:
a first calculation unit configured to calculate a euclidean distance between the power data based on the power data;
the second calculation unit is used for calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data;
the generating unit is used for generating the support degree matrix according to the support degree among all the power data;
wherein, the support formula is as follows:
Sup=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure BDA0003482204960000051
wherein Sup is support degree, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
In one embodiment, the clustering module includes:
the iteration unit is used for iteratively executing the third calculation unit, the determination unit, the acting unit and the deletion unit until a preset iteration stop condition is reached to obtain a plurality of sub-clusters;
the third calculating unit is configured to calculate a sum of support degrees of each row of data in the support degree matrix, and add the sum of support degrees to a corresponding row of the support degree matrix;
the determining unit is used for determining the maximum support degree sum in the plurality of support degree sums and determining the maximum support degree in each column of data;
the serving unit is configured to serve column data corresponding to a target maximum support degree as a same sub-cluster to obtain a sub-cluster, where the target maximum support degree is a maximum support degree in a same row as a sum of the maximum support degrees;
and the deleting unit is used for deleting the column data corresponding to the target maximum support degree in the support degree matrix to obtain a new support degree matrix.
In one embodiment, the detection apparatus further comprises:
and the processing module is used for carrying out lower triangular numerical processing on the support matrix to obtain the support matrix which is an upper triangular inverse value.
It should be noted that, please refer to the relevant description of the first aspect for the beneficial effects of the second aspect, which is not described herein again.
Drawings
Fig. 1 is a schematic flowchart of a method for detecting abnormal data according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an abnormal data detection apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the related art, abnormal data is mainly detected by a single physical space security protection technique or an information space abnormality detection technique such as a distance clustering algorithm or a density distance clustering algorithm. However, the traditional distance clustering algorithm needs to manually make a clustering center in advance, and the traditional density distance clustering algorithm needs to determine a threshold and a density definition in advance, so that the requirement of hierarchical clustering on a data shape structure is too high, and the data adaptability is poor.
Therefore, the embodiment of the application provides a method and a device for detecting abnormal data, wherein the method for detecting abnormal data is characterized in that the incidence relation between data is presented by calculating the support degree between the electric power data and generating a support degree matrix; and clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters, comparing the sub-clusters, and detecting abnormal data to obtain an abnormal detection result of the power data. Different from the traditional distance clustering algorithm and the traditional density distance clustering algorithm, the support sum of each row of data in the support matrix is used as a sub-cluster center, the cluster center does not need to be manually established in advance, and the threshold and the density definition do not need to be determined in advance. According to the method and the device, the clustering weight can be adaptively adjusted according to the support sum calculated by the individual model, the requirement of the clustering process on the data structure is reduced, so that the method and the device can be suitable for more different data, and the data adaptability of the abnormal data detection method is improved.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a method for detecting network anomaly data according to an embodiment of the present application. The method for detecting abnormal data described in the embodiments of the present application may be applied to computer devices, including but not limited to smart phones, tablet computers, desktop computers, supercomputers, personal digital assistants, physical servers, cloud servers, and other computer devices. The method for detecting abnormal data in the embodiment of the application includes steps S101 to S104, which are detailed as follows:
step S101, acquiring a power data set, wherein the power data set comprises a plurality of power data.
In this step, the power data is operation data collected when the power grid equipment operates, which includes, but is not limited to, numerical data and text data. Alternatively, the numerical data and the text data are heterogeneous data, and the data structures of the numerical data and the text data can be unified through standardized operation.
In one embodiment, power observation data is obtained, wherein the power observation data comprises numerical data and text data; and carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, wherein the electric power data form the electric power data set.
Optionally, for numerical data, by normalizing the formula
Figure BDA0003482204960000071
Processing, wherein dn is the power data obtained after the numerical data normalization, x is the numerical data,
Figure BDA0003482204960000072
is the mean of all numerical data of the same type, and δ is the standard deviation.
Alternatively, for text-type data, through the TF-IDF algorithm
Figure BDA0003482204960000073
Is subjected to a treatment in which dmFor power data obtained after text-type data normalization, nijRepresents the number of times the word i appears in the text j, ∑ nijRepresenting the total number of occurrences of all words in text j.
Electric power data Di={d1,d2...dn...dmI.e. DiIncluding all dnAnd all dm. The embodiment uniformly converts the electric power observation data into the electric power data through numerical value normalization to realize the standardization of the electric power observation data, thereby facilitating the subsequent data operation and improving the data operation efficiency.
It is understood that the normalization process of the power observation data may be executed by the computer device implementing the detection method, or the obtained power data may be migrated to the computer device after being executed by other computing devices.
And step S102, calculating the support degree among the electric power data and generating a support degree matrix.
In this step, the support degree is used to characterize the correlation between the power data. And calculating the similarity between the power data, calculating the support degree between the power data by using the similarity, and constructing all the support degrees into a support degree matrix. Alternatively, the similarity may be euclidean distance similarity, cosine similarity, or the like.
In an embodiment, step S102 specifically includes: calculating Euclidean distances between the power data based on the power data; calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data; generating the support matrix according to the support among all the electric power data; wherein, the support formula is as follows:
Sup(ai,aj)=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure BDA0003482204960000081
wherein Sup (a)i,aj) Is aiAnd ajDegree of support between, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
Based on the support between all the power data, a support matrix is generated, examples of which are as follows:
Figure BDA0003482204960000082
in the embodiment, the euclidean distance between the power data is calculated to obtain the similarity between the power data, the support degree is determined based on the euclidean distance, and all the support degrees are constructed into the support degree matrix, so that the incidence relation between the power data is obtained, the support degree matrix with the incidence relation of the power data can be utilized for clustering, and the data adaptability is improved.
And step S103, clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters.
In this step, the support matrix includes rows and columns, the sum of all the support degrees in each row of data is calculated to obtain a support degree sum, the support degree sum is used as a sub-cluster center point of the current iteration, then the maximum support degree in each column of support degrees is compared with the support degree sum, if the maximum support degree and the support degree sum are in the same row, the maximum support degree is indicated to be near the sub-cluster center point, therefore, all the maximum support degrees near the sub-cluster center point of the current iteration are used as a sub-cluster, and the iteration process is repeated to obtain a plurality of sub-clusters. Compared with the traditional distance clustering algorithm and the traditional density distance clustering algorithm, the method has the advantages that the clustering center does not need to be established manually in advance, the threshold value and the density definition do not need to be determined in advance, the requirement of the clustering process on the data shape structure is reduced, and the data adaptability of the clustering process is improved.
In an embodiment, the step S103 specifically includes:
step one, calculating the sum of the support degree of each row of data in the support degree matrix, and adding the sum of the support degree to the corresponding row of the support degree matrix;
determining the maximum support degree sum of the support degree sums, and determining the maximum support degree in each line of data;
taking column data corresponding to the target maximum support degree as the same sub-cluster to obtain a sub-cluster, wherein the target maximum support degree is the maximum support degree on the same row with the sum of the maximum support degrees;
deleting the column data corresponding to the target maximum support degree in the support degree matrix to obtain a new support degree matrix;
and repeating the first step to the fourth step until a preset iteration stop condition is reached to obtain a plurality of sub-clusters.
In this embodiment, for the obtained support matrix, a new row is added to find the Sum of each row, which is exemplified as follows:
Figure BDA0003482204960000091
determining the maximum Sum of the supportability sums, and determining the maximum supportability in each column of data, such as Sum2, the maximum Sum of the supportability in the first column being Sum (a)2,a1) The maximum support of the second column is Sup (a)n,a2) The maximum support in the nth column is Sup (a)1,an). Comparing the maximum support of each column with the sum of the maximum supports, comparing all with the maximum supportAnd summing the target maximum support degrees in the same row to obtain a sub-cluster. As described above, Su (a)2,a1) Sum2 in the same row as the Sum of maximum support, Su (a)2,a1) The corresponding column data is used to obtain a new support matrix, which is exemplified as follows:
Figure BDA0003482204960000092
and returning to the first step again by using the new support matrix for iteration until a preset iteration stop condition is reached. The iteration stop condition can be that the data in the support matrix is completely deleted or the clustering of the data is artificially stopped.
And step S104, comparing the plurality of sub-clusters, and detecting abnormal data to obtain an abnormal detection result of the power data.
In this step, abnormal data is picked out by comparing sub-clusters in the clustering result, and the device that generates the source data correspondingly is an abnormal device.
In an embodiment, on the basis of the embodiment shown in fig. 1, before the step S103, the method further includes: and carrying out lower triangular numeralization processing on the support matrix to obtain the support matrix which is an upper triangular reciprocal value.
In this embodiment, the support matrix is subjected to lower triangular numerical processing, so that an iteration result of each iteration process is more easily solved during clustering, and the iteration efficiency of the clustering process is improved.
In order to implement the method for detecting abnormal data corresponding to the above method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of an abnormal data detection apparatus according to an embodiment of the present application. For convenience of explanation, only the part related to the present embodiment is shown, and the apparatus for detecting abnormal data provided in the embodiment of the present application includes:
an obtaining module 201, configured to obtain a power data set, where the power data set includes a plurality of power data;
a calculating module 202, configured to calculate a support degree between the power data, and generate a support degree matrix;
the clustering module 203 is configured to cluster the support degrees in the support degree matrix according to a support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters;
the detection module 204 is configured to compare the multiple sub-clusters, detect abnormal data, and obtain an abnormal detection result of the power data.
In an embodiment, the obtaining module 201 includes:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring electric power observation data which comprise numerical data and text data;
and the normalization unit is used for carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, and the plurality of electric power data form the electric power data set.
In one embodiment, the calculation module 202 includes:
a first calculation unit configured to calculate a euclidean distance between the power data based on the power data;
the second calculation unit is used for calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data;
the generating unit is used for generating the support degree matrix according to the support degree among all the power data;
wherein, the support formula is as follows:
Sup=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure BDA0003482204960000111
wherein Sup is support degree, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
In an embodiment, the clustering module 203 includes:
the iteration unit is used for iteratively executing the third calculation unit, the determination unit, the acting unit and the deletion unit until a preset iteration stop condition is reached to obtain a plurality of sub-clusters;
the third calculating unit is configured to calculate a sum of support degrees of each row of data in the support degree matrix, and add the sum of support degrees to a corresponding row of the support degree matrix;
the determining unit is used for determining the maximum support degree sum in the plurality of support degree sums and determining the maximum support degree in each column of data;
the serving unit is configured to serve column data corresponding to a target maximum support degree as a same sub-cluster to obtain a sub-cluster, where the target maximum support degree is a maximum support degree in a same row as a sum of the maximum support degrees;
and the deleting unit is used for deleting the column data corresponding to the target maximum support degree in the support degree matrix to obtain a new support degree matrix.
In one embodiment, the detection apparatus further comprises:
and the processing module is used for carrying out lower triangular numerical processing on the support matrix to obtain the support matrix which is an upper triangular inverse value.
The above-described abnormal data detection device may implement the abnormal data detection method of the above-described method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and are not described herein again.
The functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (10)

1. A method for detecting anomalous data, comprising:
obtaining a power data set, the power data set comprising a plurality of power data;
calculating the support degree between the electric power data and generating a support degree matrix;
clustering the support degrees in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters;
and comparing the sub-clusters to detect abnormal data to obtain an abnormal detection result of the power data.
2. The detection method of claim 1, wherein said acquiring a power data set comprises:
acquiring electric power observation data, wherein the electric power observation data comprises numerical data and text data;
and carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, wherein the electric power data form the electric power data set.
3. The detection method of claim 1, wherein the calculating a support between the power data and generating a support matrix comprises:
calculating Euclidean distances between the power data based on the power data;
calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data;
generating the support matrix according to the support among all the electric power data;
wherein, the support formula is as follows:
Sup=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure FDA0003482204950000011
wherein Sup is support degree, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
4. The method according to claim 1, wherein the clustering the support degrees in the support degree matrix according to the sum of the support degrees of each row of data in the support degree matrix to obtain a plurality of sub-clusters comprises:
step one, calculating the sum of the support degree of each row of data in the support degree matrix, and adding the sum of the support degree to the corresponding row of the support degree matrix;
determining the maximum support degree sum of the support degree sums, and determining the maximum support degree in each line of data;
taking column data corresponding to the target maximum support degree as the same sub-cluster to obtain a sub-cluster, wherein the target maximum support degree is the maximum support degree on the same row with the sum of the maximum support degrees;
deleting the target maximum support degree in the support degree matrix to obtain a new support degree matrix;
and repeating the first step to the fourth step until a preset iteration stop condition is reached to obtain a plurality of sub-clusters.
5. The method according to claim 1, wherein before clustering the support degrees in the support degree matrix according to the sum of the support degrees of each row of data in the support degree matrix to obtain a plurality of sub-clusters, the method further comprises:
and carrying out lower triangular numeralization processing on the support matrix to obtain the support matrix which is an upper triangular reciprocal value.
6. An apparatus for detecting abnormal data, comprising:
an acquisition module to acquire a power data set, the power data set including a plurality of power data;
the calculation module is used for calculating the support degree between the electric power data and generating a support degree matrix;
the clustering module is used for clustering the support degree in the support degree matrix according to the support degree sum of each row of data in the support degree matrix to obtain a plurality of sub-clusters;
and the detection module is used for comparing the plurality of sub-clusters and detecting abnormal data to obtain an abnormal detection result of the power data.
7. The detection apparatus of claim 6, wherein the acquisition module comprises:
the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring electric power observation data which comprise numerical data and text data;
and the normalization unit is used for carrying out numerical value normalization on the numerical data and the text data to obtain a plurality of electric power data, and the plurality of electric power data form the electric power data set.
8. The detection apparatus of claim 6, wherein the calculation module comprises:
a first calculation unit configured to calculate a euclidean distance between the power data based on the power data;
the second calculation unit is used for calculating the support degree between the electric power data by using a preset support degree formula based on the Euclidean distance between the electric power data;
the generating unit is used for generating the support degree matrix according to the support degree among all the power data;
wherein, the support formula is as follows:
Sup=(1-dN(ai,aj))k,k>0,ai=(x1,x2,...,xi),aj=(x1,x2,...,xj);
Figure FDA0003482204950000031
wherein Sup is support degree, aiIs the ith power data, ajFor j power data, dN(ai,aj) As power data aiAnd power data ajNormalized Euclidean distance, x betweeniAs power data aiThe ith data vector, xjAs power data ajThe jth data vector of (1).
9. The detection apparatus of claim 6, wherein the clustering module comprises:
the iteration unit is used for iteratively executing the third calculation unit, the determination unit, the acting unit and the deletion unit until a preset iteration stop condition is reached to obtain a plurality of sub-clusters;
the third calculating unit is configured to calculate a sum of support degrees of each row of data in the support degree matrix, and add the sum of support degrees to a corresponding row of the support degree matrix;
the determining unit is used for determining the maximum support degree sum in the plurality of support degree sums and determining the maximum support degree in each column of data;
the serving unit is configured to serve column data corresponding to a target maximum support degree as a same sub-cluster to obtain a sub-cluster, where the target maximum support degree is a maximum support degree in a same row as a sum of the maximum support degrees;
and the deleting unit is used for deleting the column data corresponding to the target maximum support degree in the support degree matrix to obtain a new support degree matrix.
10. The sensing device of claim 6, further comprising:
and the processing module is used for carrying out lower triangular numerical processing on the support matrix to obtain the support matrix which is an upper triangular inverse value.
CN202210071382.1A 2022-01-21 2022-01-21 Abnormal data detection method and device Pending CN114418006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210071382.1A CN114418006A (en) 2022-01-21 2022-01-21 Abnormal data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210071382.1A CN114418006A (en) 2022-01-21 2022-01-21 Abnormal data detection method and device

Publications (1)

Publication Number Publication Date
CN114418006A true CN114418006A (en) 2022-04-29

Family

ID=81274572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210071382.1A Pending CN114418006A (en) 2022-01-21 2022-01-21 Abnormal data detection method and device

Country Status (1)

Country Link
CN (1) CN114418006A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742178A (en) * 2022-06-10 2022-07-12 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN116881746A (en) * 2023-09-08 2023-10-13 国网江苏省电力有限公司常州供电分公司 Identification method and identification device for abnormal data in electric power system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742178A (en) * 2022-06-10 2022-07-12 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN114742178B (en) * 2022-06-10 2022-11-08 航天亮丽电气有限责任公司 Method for non-invasive pressure plate state monitoring through MEMS six-axis sensor
CN116881746A (en) * 2023-09-08 2023-10-13 国网江苏省电力有限公司常州供电分公司 Identification method and identification device for abnormal data in electric power system
CN116881746B (en) * 2023-09-08 2023-11-14 国网江苏省电力有限公司常州供电分公司 Identification method and identification device for abnormal data in electric power system

Similar Documents

Publication Publication Date Title
CN108055281B (en) Account abnormity detection method, device, server and storage medium
CN114418006A (en) Abnormal data detection method and device
CN109815487B (en) Text quality inspection method, electronic device, computer equipment and storage medium
CN110826648A (en) Method for realizing fault detection by utilizing time sequence clustering algorithm
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN111125658B (en) Method, apparatus, server and storage medium for identifying fraudulent user
CN111814910B (en) Abnormality detection method, abnormality detection device, electronic device, and storage medium
CN111612038A (en) Abnormal user detection method and device, storage medium and electronic equipment
CN116405299A (en) Alarm based on network security
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
CN109271957B (en) Face gender identification method and device
CN112069498A (en) SQL injection detection model construction method and detection method
CN112468658A (en) Voice quality detection method and device, computer equipment and storage medium
CN116167010A (en) Rapid identification method for abnormal events of power system with intelligent transfer learning capability
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN112100617B (en) Abnormal SQL detection method and device
CN117370548A (en) User behavior risk identification method, device, electronic equipment and medium
CN116545733A (en) Power grid intrusion detection method and system
CN116319033A (en) Network intrusion attack detection method, device, equipment and storage medium
CN116561737A (en) Password validity detection method based on user behavior base line and related equipment thereof
CN115170153B (en) Work order processing method and device based on multidimensional attribute and storage medium
CN114298245A (en) Anomaly detection method and device, storage medium and computer equipment
CN112860648A (en) Intelligent analysis method based on log platform
CN112989295A (en) User identification method and device
CN116992447B (en) Malicious file detection method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination