CN110083475B - Abnormal data detection method and device - Google Patents

Abnormal data detection method and device Download PDF

Info

Publication number
CN110083475B
CN110083475B CN201910327595.4A CN201910327595A CN110083475B CN 110083475 B CN110083475 B CN 110083475B CN 201910327595 A CN201910327595 A CN 201910327595A CN 110083475 B CN110083475 B CN 110083475B
Authority
CN
China
Prior art keywords
data object
cluster
data
data objects
local density
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910327595.4A
Other languages
Chinese (zh)
Other versions
CN110083475A (en
Inventor
孙尚勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN201910327595.4A priority Critical patent/CN110083475B/en
Publication of CN110083475A publication Critical patent/CN110083475A/en
Application granted granted Critical
Publication of CN110083475B publication Critical patent/CN110083475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The embodiment of the application provides a method and a device for detecting abnormal data, and relates to the technical field of computers. The method comprises the following steps: determining the local density corresponding to each data object in the data object set by adopting a preset density clustering algorithm; for each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in a region with the data object as a center and a preset truncation distance as a radius, taking the data object as a clustering center to create a cluster; for each created cluster, determining a core data object contained in the cluster, and updating the cluster according to the core data object contained in the cluster; and taking the data objects which do not belong to any cluster as abnormal data objects. By adopting the method and the device, the accuracy of abnormal index detection can be improved.

Description

Abnormal data detection method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for detecting abnormal data.
Background
Currently, people usually determine the operation state of equipment by monitoring the operation index of the equipment. Specifically, abnormal data may be determined in each operation index, and then a problem that may exist in the device may be analyzed based on the abnormal data of each index. The operation index may include a service index and an equipment index, where the service index refers to an index reflecting the scale and quality of the equipment, such as a web page response time, a web page access amount, a connection error amount, and the like; the device index is an index reflecting a device status, for example, a Central Processing Unit (CPU) utilization rate, a memory utilization rate, a disk Input/Output (I/O) rate, a network card throughput rate, and the like.
In the related art, the density peak clustering algorithm is one of the commonly used algorithms for determining abnormal data, and the specific processing procedure is as follows: and acquiring a data object set of a certain operation index, wherein the data object set comprises a plurality of data objects, and the data objects are numerical values of the operation index acquired according to a preset sampling period. Then, for randomly searching the data objects in the data object set, the number of data objects (i.e. local density) contained in a preset density radius range with the data objects as the center of the circle is determined. And if the local density is not less than the preset density threshold value, determining that the data object is a core data object. And then, taking the core data object as a clustering center, and forming clusters by using the data objects with preset density radius as a radius range. For each core data object contained in the cluster, dividing the data objects within a preset density radius range (also called direct density reachable) to the cluster by taking the core data object as a center until the data objects in the cluster are not increased any more. Based on the above processing, at least one cluster may be generated. And then, taking the data object which does not belong to any cluster in the data object set as an abnormal data object.
In the above technical solution, the first data object with the determined local density not less than the preset density threshold is used as the cluster center of the cluster. However, there may also be a greater local density of data objects in this range, i.e. the data objects may not be true cluster centers. The accuracy of the clustering result is directly influenced by the selection of the clustering center, so that the accuracy of abnormal data detection is low.
Disclosure of Invention
An embodiment of the present application aims to provide a method and an apparatus for detecting abnormal data, so as to improve accuracy of abnormal index detection. The specific technical scheme is as follows:
in a first aspect, a method for detecting abnormal data is provided, the method including:
determining local densities corresponding to the data objects in a data object set by adopting a preset density clustering algorithm, wherein the data object set comprises a plurality of data objects, and one data object consists of a plurality of operation indexes of one target device acquired at the same historical sampling time point;
for each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in an area with the data object as a center and a preset truncation distance as a radius, creating a cluster with the data object as a clustering center;
for each created cluster, determining a core data object contained in the cluster, and updating the cluster according to the core data object contained in the cluster;
and taking the data objects which do not belong to any cluster as abnormal data objects.
Optionally, the determining, by using a preset density clustering algorithm, the local densities corresponding to the data objects in the data object set respectively includes:
for each data object in the set of data objects, determining a distance between the data object and other data objects;
and taking the number of the data objects with the distance between the data objects and the preset truncation distance as the local density corresponding to the data objects.
Optionally, the creating a cluster by using the data object as a cluster center includes:
and dividing the data objects in the range taking the data objects as the circle center and the preset truncation distance as the radius into the cluster.
Optionally, the determining, for each created cluster, a core data object included in the cluster, and updating the cluster according to the core data object included in the cluster includes:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects which take the core data object as the center of a circle and take the preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects which are newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
Optionally, the method further includes:
and calculating the product of the local density threshold and a preset shrinkage factor to obtain the core local density threshold, wherein the value of the shrinkage factor is less than 1.
In a second aspect, an apparatus for detecting abnormal data is provided, the apparatus including: the device comprises a determining module, a creating module and an updating module;
the determining module is used for determining the local density corresponding to each data object in a data object set by adopting a preset density clustering algorithm, wherein the data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device acquired at the same historical sampling time point;
the creating module is configured to, for each data object in the data object set, create a cluster by using the data object as a clustering center if a local density corresponding to the data object is greater than a preset local density threshold, and no data object having a local density greater than the local density of the data object exists in an area with the data object as a center and a preset truncation distance as a radius;
the updating module is used for determining the core data object contained in each created cluster and updating the cluster according to the core data object contained in the cluster;
the determining module is further configured to use a data object that does not belong to any cluster as an abnormal data object.
Optionally, the determining module is specifically configured to:
for each data object in the set of data objects, determining a distance between the data object and the other data objects;
and taking the number of the data objects with the distance between the data objects and the preset truncation distance as the local density corresponding to the data objects.
Optionally, the creating module is specifically configured to:
and dividing the data objects in the range taking the data objects as the circle center and the preset truncation distance as the radius into the cluster.
Optionally, the update module is specifically configured to:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects which take the core data object as the center of a circle and take the preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects which are newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
Optionally, the apparatus further comprises: a calculation module;
the calculation module is configured to calculate a product of the local density threshold and a preset shrinkage factor to obtain the core local density threshold, where a value of the shrinkage factor is smaller than 1.
In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any one of the first aspect when executing a program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when being executed by a processor, carries out the method steps of any one of the first aspects.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the first aspects above.
According to the method and the device for detecting the abnormal data, a preset density clustering algorithm can be adopted to determine the local density corresponding to each data object in a data object set, the data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device collected at the same historical sampling time point. And aiming at each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in a region with the data object as the center and a preset truncation distance as the radius, the data object is used as a clustering center to create a cluster. And aiming at each created cluster, determining a core data object contained in the cluster, updating the cluster according to the core data object contained in the cluster, and taking a data object which does not belong to any cluster as an abnormal data object. Therefore, the data object with the largest local density in the area taking the truncation distance as the radius can be determined as the clustering center, and the accuracy of the determined clustering center is higher, so that the accuracy of abnormal data detection is improved.
Of course, it is not necessary for any product or method of the present application to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is an architecture diagram of an operation and maintenance system provided in an embodiment of the present application;
fig. 2 is a flowchart of a method for detecting abnormal data according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an apparatus for detecting abnormal data according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a method for detecting abnormal data, which can be applied to an operation and maintenance system, and particularly can be applied to an operation and maintenance server or a service server in the operation and maintenance system. The embodiment of the application takes an operation and maintenance server as an example for description, and other situations are similar to the operation and maintenance server. Fig. 1 is an architecture diagram of an operation and maintenance system according to an embodiment of the present application, and as shown in fig. 1, the operation and maintenance system includes an operation and maintenance server and a plurality of service servers. The operation and maintenance server is connected with each service server and is used for collecting operation indexes of each service server according to a preset sampling period.
The following will describe a method for detecting abnormal data provided in the embodiments of the present application in detail with reference to specific embodiments, as shown in fig. 2, the specific steps are as follows:
step 201, a preset density clustering algorithm is adopted to determine the local density corresponding to each data object in the data object set.
The data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device collected at the same historical sampling time point.
In implementation, the operation and maintenance server may collect the operation index of the target device according to a preset sampling period. The operation index can comprise a service index and an equipment index, wherein the service index refers to an index reflecting the scale and the quality of equipment. Such as web page response time, web page access volume, number of connection errors, etc. The equipment index is an index reflecting the status of the equipment. For example, CPU utilization, memory utilization, disk I/O, network card throughput, etc. The operation index may also include other types of indexes, and the embodiment of the present application is not limited. As shown in table one, table one is an operation index of the target device acquired by the operation and maintenance server at different sampling times.
Watch 1
Figure BDA0002036710490000061
After the operation and maintenance server acquires the operation index of the target device, a plurality of operation indexes of the target device acquired at the same sampling time point may be combined into one data object (the ith data object may be denoted as subTra) i ) And a plurality of data objects are combined into a data object set (which can be denoted as T { subcor } 1 ,subTra 2 …subTra j …subTra i …subTra n })。
The operation and maintenance server can acquire a plurality of data objects corresponding to the target network device collected within a preset historical time, and the data objects form a data object set corresponding to the target device so as to perform subsequent processing.
After the operation and maintenance server obtains the data object set corresponding to the target device, the operation and maintenance server may further determine the local density corresponding to each data object (the local density corresponding to the ith data object may be denoted as δ i )。
Optionally, the specific processing process of the operation and maintenance server determining the local density corresponding to each data object is as follows: and determining the distance between each data object in the data object set and other data objects, and taking the number of the data objects of which the distance between the data objects is less than a preset truncation distance as the local density corresponding to the data object.
In the implementation, the operation and maintenance server stores the cutoff distance (which may be denoted as d) in advance c ). The cut-off distance may be set empirically by a skilled person. For each data object in the data object set, the operation and maintenance server may calculate a distance between the data object and other data objects (the distance between the ith data object and the jth data object may be denoted as d) ij ). Wherein, the distance can be Euclidean distance, correspondingly, the operation and maintenance server can calculate d according to a preset Euclidean distance calculation formula ij
Then, the operation and maintenance server may determine the data object whose distance from the operation and maintenance server is less than the preset truncation distance, further count the number of the determined data object, and use the number as the local density corresponding to the data object. The operation and maintenance server determines the formula of the local density corresponding to each data object as shown in formula (1) and formula (2).
Figure BDA0002036710490000071
Figure BDA0002036710490000072
Wherein, delta i Is the local density of the ith data object, d ij Is the distance from the ith data object to the jth data object, d c Is the truncation distance.
Step 202, for each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold, and no data object with a local density greater than the local density of the data object exists in an area with the data object as a center and a preset truncation distance as a radius, creating a cluster with the data object as a cluster center.
In implementation, the operation and maintenance server may further store a local density threshold in advance. The local density threshold may be set empirically by a skilled person. After the operation and maintenance server obtains the local density of each data object in the data object set, it may further determine whether the local density corresponding to the data object is greater than or equal to a preset local density threshold. If the local density corresponding to the data object is greater than or equal to the preset local density threshold, the data object in the area with the data object as the center and the preset truncation distance as the radius can be further determined, and then, according to the local density of each data object in the area, whether the data object with the local density greater than that of the data object exists is judged. If no data object with the local density larger than that of the data object exists, the data object is the data object with the maximum local density in the area, and the data object is used as a cluster center to create a cluster. On the contrary, if the local density corresponding to the data object is smaller than the preset local density threshold, or there is a data object whose local density is greater than the local density of the data object, it indicates that the data object is not the data object with the largest local density in the region, and the data object is not the cluster center.
Optionally, the processing process of the operation and maintenance server that creates the cluster by using the data object as the clustering center is as follows: and dividing the data objects in the range taking the data objects as the circle center and the preset truncation distance as the radius into the clusters.
In implementation, after the operation and maintenance server determines that the data object is a cluster center, the operation and maintenance server may partition the data objects in a range in which the data object is a circle center and a preset truncation distance is a radius into the cluster, that is, partition the data objects in the data object set, whose distance from the cluster center is less than or equal to the preset truncation distance, into the cluster corresponding to the cluster center.
Step 203, aiming at each created cluster, determining the core data object contained in the cluster, and updating the cluster according to the core data object contained in the cluster.
In implementation, for each created cluster, the operation and maintenance server may further determine whether a core data object exists in data objects (other data objects except for the cluster center) contained in the cluster. If a core data object exists, the operation and maintenance server can update the cluster according to the core data object contained in the cluster.
Optionally, for each created cluster, the operation and maintenance server determines the core data object included in the cluster, and updates the processing procedure of the cluster according to the core data object included in the cluster as follows:
step one, aiming at each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster.
In implementation, the operation and maintenance server may store a core local density threshold in advance. The core local density threshold may be set by a technician based on experience, or the operation and maintenance server may calculate a product of the local density threshold and a preset shrinkage factor, where the product is the core local density threshold. Wherein, the value of the shrinkage factor is less than 1, and the value range of the shrinkage factor can be 0.8-0.9 according to experience. For example, if the local density threshold is 10 and the predetermined shrinkage factor is 0.8, the core local density threshold is 8.
For each created cluster, the operation and maintenance server may determine, in the data objects included in the cluster, whether the local density corresponding to each data object is greater than or equal to the core local density threshold. If the local density corresponding to a certain data object is greater than or equal to the core local density threshold, the operation and maintenance server may determine that the data object is a core data object.
And step two, aiming at each determined core data object, dividing the data objects which take the core data object as the center of a circle and take a preset truncation distance as a radius (the radius can also be other empirical values) into the cluster, and continuously determining the core data object in the data objects newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
In implementation, after the operation and maintenance server determines the core data objects included in the cluster, for each determined core data object, the data objects in a range (which may be referred to as direct density reachable) taking the core data object as a center of a circle and a preset truncation distance as a radius may be divided into the cluster, that is, the data objects in the data object set whose distance from the core data object is less than or equal to the preset truncation distance are divided into the cluster corresponding to the cluster center, so as to obtain an updated cluster. For the updated cluster, the operation and maintenance server may further determine whether each newly added data object is a core data object. If a newly added certain data object is a core data object, the operation and maintenance server may further divide the data object, which is within a range taking the core data object as a center of a circle and a preset truncation distance as a radius, into the cluster, and so on until the data object included in the cluster remains unchanged.
And step 204, taking the data objects which do not belong to any cluster as abnormal data objects.
In implementation, if a certain data object in the data object set does not belong to any cluster, the operation and maintenance server may determine that the data object is an abnormal data object, and the data included in the abnormal data object is data when the target device is abnormal. The operation and maintenance server can output the abnormal data object and the identification of the target equipment, so that the operation and maintenance personnel can know that the target equipment is abnormal.
In this embodiment, a preset density clustering algorithm may be first adopted to determine local densities corresponding to the data objects in the data object set, where the data object set includes a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device acquired at the same historical sampling time point. And aiming at each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in a region with the data object as the center and a preset truncation distance as the radius, the data object is used as a clustering center to create a cluster. And aiming at each created cluster, determining a core data object contained in the cluster, updating the cluster according to the core data object contained in the cluster, and taking a data object which does not belong to any cluster as an abnormal data object. Therefore, the data object with the largest local density in the area with the truncation distance as the radius can be determined as the clustering center, and the accuracy of the determined clustering center is higher, so that the accuracy of abnormal data detection is improved.
Based on the same technical concept, an embodiment of the present application further provides an apparatus for detecting abnormal data, as shown in fig. 3, the apparatus includes: a determination module 310, a creation module 320, and an update module 330;
the determining module 310 is configured to determine, by using a preset density clustering algorithm, local densities corresponding to data objects in a data object set, where the data object set includes a plurality of data objects, and each data object is formed by a plurality of operation indexes of one target device collected at the same historical sampling time point;
a creating module 320, configured to, for each data object in the data object set, if a local density corresponding to the data object is greater than a preset local density threshold, and a data object whose local density is greater than the local density of the data object does not exist in an area with the data object as a center and a preset truncation distance as a radius, create a cluster with the data object as a clustering center;
an updating module 330, configured to determine, for each created cluster, a core data object included in the cluster, and update the cluster according to the core data object included in the cluster;
the determining module 310 is further configured to use a data object that does not belong to any cluster as an abnormal data object.
Optionally, the determining module 310 is specifically configured to:
for each data object in the set of data objects, determining a distance between the data object and the other data objects;
and taking the number of the data objects with the distance between the data objects and the preset truncation distance as the local density corresponding to the data objects.
Optionally, the creating module 320 is specifically configured to:
and dividing the data objects in the range taking the data objects as the circle center and the preset truncation distance as the radius into the clusters.
Optionally, the updating module 330 is specifically configured to:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects which take the core data object as the center of a circle and take a preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects which are newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
Optionally, the apparatus further comprises: a calculation module;
and the calculation module is used for calculating the product of the local density threshold and a preset shrinkage factor to obtain a core local density threshold, and the numerical value of the shrinkage factor is less than 1.
In this embodiment, a preset density clustering algorithm may be first used to determine local densities corresponding to each data object in a data object set, where the data object set includes multiple data objects, and a data object is formed by multiple operation indexes of a target device acquired at a same historical sampling time point. And aiming at each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in a region with the data object as the center and a preset truncation distance as the radius, the data object is used as a clustering center to create a cluster. And aiming at each created cluster, determining a core data object contained in the cluster, updating the cluster according to the core data object contained in the cluster, and taking a data object which does not belong to any cluster as an abnormal data object. Therefore, the data object with the largest local density in the area with the truncation distance as the radius can be determined as the clustering center, and the accuracy of the determined clustering center is higher, so that the accuracy of abnormal data detection is improved.
The embodiment of the present application further provides an electronic device, as shown in fig. 4, which includes a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication through the communication bus 404,
a memory 403 for storing a computer program;
the processor 401, when executing the program stored in the memory 403, implements the following steps:
determining local densities corresponding to the data objects in a data object set by adopting a preset density clustering algorithm, wherein the data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device acquired at the same historical sampling time point;
for each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in an area with the data object as a center and a preset truncation distance as a radius, taking the data object as a clustering center to create a cluster;
for each created cluster, determining a core data object contained in the cluster, and updating the cluster according to the core data object contained in the cluster;
and taking the data objects which do not belong to any cluster as abnormal data objects.
Optionally, the determining, by using a preset density clustering algorithm, the local densities corresponding to the data objects in the data object set respectively includes:
for each data object in the set of data objects, determining a distance between the data object and the other data objects;
and taking the number of the data objects with the distance between the data objects and the preset truncation distance as the local density corresponding to the data objects.
Optionally, the creating a cluster by using the data object as a cluster center includes:
and dividing the data objects in the range taking the data objects as the circle center and the preset truncation distance as the radius into the cluster.
Optionally, the determining, for each created cluster, a core data object included in the cluster, and updating the cluster according to the core data object included in the cluster includes:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects in the range taking the core data object as the center of a circle and the preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
Optionally, the method further includes:
and calculating the product of the local density threshold and a preset shrinkage factor to obtain the core local density threshold, wherein the value of the shrinkage factor is less than 1.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), for example, at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, or discrete hardware components.
Based on the same technical concept, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for detecting abnormal data described above are implemented.
Based on the same technical concept, embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, causes the computer to perform any one of the above-mentioned abnormal data detection methods.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, since it is substantially similar to the method embodiment, the description is simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (6)

1. A method for detecting anomalous data, said method comprising:
determining local densities corresponding to the data objects in a data object set by adopting a preset density clustering algorithm, wherein the data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device acquired at the same historical sampling time point; the determining the local density corresponding to each data object in the data object set by adopting a preset density clustering algorithm comprises the following steps: for each data object in the set of data objects, determining a distance between the data object and other data objects; taking the number of the data objects with the distance between the data objects and the data objects smaller than the preset truncation distance as the local density corresponding to the data objects;
for each data object in the data object set, if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in an area with the data object as a center and a preset truncation distance as a radius, creating a cluster with the data object as a clustering center; the creating of the cluster by taking the data object as the cluster center comprises the following steps: dividing the data objects in the range taking the data object as the center of a circle and the preset truncation distance as the radius into the clusters;
for each created cluster, determining a core data object contained in the cluster, and updating the cluster according to the core data object contained in the cluster;
and taking the data objects which do not belong to any cluster as abnormal data objects.
2. The method according to claim 1, wherein for each created cluster, determining core data objects contained in the cluster, and updating the cluster according to the core data objects contained in the cluster, comprises:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects in the range taking the core data object as the center of a circle and the preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
3. The method of claim 2, further comprising:
and calculating the product of the local density threshold and a preset shrinkage factor to obtain the core local density threshold, wherein the value of the shrinkage factor is less than 1.
4. An apparatus for detecting abnormal data, the apparatus comprising: the device comprises a determining module, a creating module and an updating module;
the determining module is used for determining the local density corresponding to each data object in a data object set by adopting a preset density clustering algorithm, wherein the data object set comprises a plurality of data objects, and one data object is composed of a plurality of operation indexes of one target device acquired at the same historical sampling time point; the determining module is specifically configured to: for each data object in the set of data objects, determining a distance between the data object and other data objects; taking the number of the data objects with the distance between the data objects and the data objects smaller than the preset truncation distance as the local density corresponding to the data objects;
the creating module is used for creating a cluster by taking each data object in the data object set as a clustering center if the local density corresponding to the data object is greater than a preset local density threshold value, and no data object with the local density greater than the local density of the data object exists in an area with the data object as the center and a preset truncation distance as a radius; the creating module is specifically configured to: dividing the data objects in the range taking the data object as the center of a circle and the preset truncation distance as the radius into the clusters;
the updating module is used for determining the core data object contained in each created cluster and updating the cluster according to the core data object contained in the cluster;
the determining module is further configured to use a data object that does not belong to any cluster as an abnormal data object.
5. The apparatus of claim 4, wherein the update module is specifically configured to:
for each created cluster, determining data objects with local density larger than a preset core local density threshold value as core data objects in the data objects contained in the cluster;
and aiming at each determined core data object, dividing the data objects which take the core data object as the center of a circle and take the preset truncation distance as the radius into the cluster, and continuously determining the core data object in the data objects which are newly divided into the cluster so as to continuously update the cluster until the data objects contained in the cluster are kept unchanged.
6. The apparatus of claim 5, further comprising: a calculation module;
the calculation module is configured to calculate a product of the local density threshold and a preset shrinkage factor to obtain the core local density threshold, where a value of the shrinkage factor is smaller than 1.
CN201910327595.4A 2019-04-23 2019-04-23 Abnormal data detection method and device Active CN110083475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910327595.4A CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910327595.4A CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Publications (2)

Publication Number Publication Date
CN110083475A CN110083475A (en) 2019-08-02
CN110083475B true CN110083475B (en) 2022-10-25

Family

ID=67416157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910327595.4A Active CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Country Status (1)

Country Link
CN (1) CN110083475B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111092757B (en) * 2019-12-06 2021-11-23 网宿科技股份有限公司 Abnormal data detection method, system and equipment
CN111125362B (en) * 2019-12-23 2023-06-16 百度国际科技(深圳)有限公司 Abnormal text determination method and device, electronic equipment and medium
CN112468329B (en) * 2020-11-13 2023-01-06 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for batch grouping management of servers
CN113343056A (en) * 2021-05-21 2021-09-03 北京市燃气集团有限责任公司 Method and device for detecting abnormal gas consumption of user
CN113542060B (en) * 2021-07-07 2023-03-07 电子科技大学中山学院 Abnormal equipment detection method based on equipment communication data characteristics
CN116882850B (en) * 2023-09-08 2023-12-12 山东科技大学 Garden data intelligent management method and system based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336781A (en) * 2013-05-29 2013-10-02 江苏大学 Medical image clustering method
CN104484600A (en) * 2014-11-18 2015-04-01 中国科学院深圳先进技术研究院 Intrusion detection method and device based on improved density clustering
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering
CN107563400A (en) * 2016-06-30 2018-01-09 中国矿业大学 A kind of density peaks clustering method and system based on grid

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3995099B2 (en) * 2004-07-27 2007-10-24 国立医薬品食品衛生研究所長 Device for dividing high-dimensional data into chunks
EP2078436A4 (en) * 2006-10-30 2014-01-22 Ericsson Telefon Ab L M Extended clustering for improved positioning
CN108537276A (en) * 2018-04-09 2018-09-14 广东工业大学 A kind of choosing method of cluster centre, device and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336781A (en) * 2013-05-29 2013-10-02 江苏大学 Medical image clustering method
CN104484600A (en) * 2014-11-18 2015-04-01 中国科学院深圳先进技术研究院 Intrusion detection method and device based on improved density clustering
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering
CN107563400A (en) * 2016-06-30 2018-01-09 中国矿业大学 A kind of density peaks clustering method and system based on grid

Also Published As

Publication number Publication date
CN110083475A (en) 2019-08-02

Similar Documents

Publication Publication Date Title
CN110083475B (en) Abnormal data detection method and device
CN109413175B (en) Information processing method and device and electronic equipment
CN111062013B (en) Account filtering method and device, electronic equipment and machine-readable storage medium
CN109165691B (en) Training method and device for model for identifying cheating users and electronic equipment
CN111538642A (en) Abnormal behavior detection method and device, electronic equipment and storage medium
CN110213255B (en) Method and device for detecting Trojan horse of host and electronic equipment
CN110674014A (en) Method and device for determining abnormal query request
CN108399115B (en) Operation and maintenance operation detection method and device and electronic equipment
US9116804B2 (en) Transient detection for predictive health management of data processing systems
CN110807487B (en) Method and device for identifying user based on domain name system flow record data
CN109597745B (en) Abnormal data processing method and device
CN108021713B (en) Document clustering method and device
CN111540202B (en) Similar bayonet determining method and device, electronic equipment and readable storage medium
CN115932144B (en) Chromatograph performance detection method, chromatograph performance detection device, chromatograph performance detection equipment and computer medium
CN108959415B (en) Abnormal dimension positioning method and device and electronic equipment
CN110955587A (en) Method and device for determining equipment to be replaced
CN113946566B (en) Web system fingerprint database construction method and device and electronic equipment
CN108463813A (en) A kind of method and apparatus carrying out data processing
CN115249043A (en) Data analysis method and device, electronic equipment and storage medium
WO2021184588A1 (en) Cluster optimization method and device, server, and medium
CN111695829B (en) Index fluctuation period calculation method and device, storage medium and electronic equipment
CN114048136A (en) Test type determination method, device, server, medium and product
CN114297037A (en) Alarm clustering method and device
CN111291127A (en) Data synchronization method, device, server and storage medium
JP6508202B2 (en) INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant