CN110083475A - A kind of detection method and device of abnormal data - Google Patents
A kind of detection method and device of abnormal data Download PDFInfo
- Publication number
- CN110083475A CN110083475A CN201910327595.4A CN201910327595A CN110083475A CN 110083475 A CN110083475 A CN 110083475A CN 201910327595 A CN201910327595 A CN 201910327595A CN 110083475 A CN110083475 A CN 110083475A
- Authority
- CN
- China
- Prior art keywords
- data object
- cluster
- data
- local density
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Abstract
The embodiment of the present application provides a kind of detection method and device of abnormal data, is related to field of computer technology.This method comprises: using preset density clustering algorithm, the corresponding local density of each data object in set of data objects is determined;For each data object in set of data objects, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, preset truncation distance is in the region of radius, there is no the data objects that local density is greater than the local density of the data object, then cluster is created using the data object as cluster centre;For each cluster of creation, the core data object that the cluster includes is determined, and the core data object for including according to the cluster updates the cluster;The data object that will not belong to either cluster, as abnormal data object.The accuracy of abnormal index detection can be improved using the application.
Description
Technical field
This application involves field of computer technology, more particularly to a kind of detection method and device of abnormal data.
Background technique
Currently, people generally use the mode of the operating index of monitoring device, to determine the operating status of equipment.Specifically
, abnormal data can be determined in each operating index, then further according to may deposit in the abnormal data analytical equipment of each index
The problem of.Wherein, operating index may include service indication and equipment index, service indication refer to reflection equipment scale,
The index of quality, for example, webpage response time, web page access amount, connection error quantity etc.;Equipment index refers to reflection equipment shape
The index of state, for example, central processing unit (English: Central Processing Unit, abbreviation: CPU) utilization rate, memory make
With rate, disk input/output (English: Input/Output, abbreviation: I/O), network interface card throughput etc..
In the related art, density peaks clustering algorithm is common one of the algorithm for determining abnormal data, specific to locate
Reason process are as follows: obtain the set of data objects of certain operating index, include multiple data objects, data pair in the set of data objects
As for according to the numerical value of the collected operating index of default sampling period.Then, for stochastic searching in set of data objects
Data object is determined using the data object as number (the i.e. office for the data object for including in the pre-set density radius in the center of circle
Portion's density).If the local density is not less than preset density threshold, it is determined that the data object is core data object.So
It afterwards, is the data object composition cluster in radius by preset density radius using core data object as cluster centre.For
The core data object is the center of circle, within the scope of preset density radius (also referred to as by each core data object for including in the cluster
Direct density is reachable) data object be divided to the cluster, until the data object in the cluster is not further added by.Based on above-mentioned place
Reason, can be generated at least one cluster.Later, the data object of any cluster will be not belonging in the set of data objects as being abnormal
Data object.
It is first data pair for being not less than preset density threshold with the local density determined in above-mentioned technical proposal
As the cluster centre as cluster.However, the data object bigger there is likely to be local density in the range, that is, the number
According to object it is possible that not being real cluster centre.Since the selection of cluster centre directly affects the accuracy of cluster result, from
And cause the accuracy rate of anomaly data detection lower.
Summary of the invention
The detection method and device for being designed to provide a kind of abnormal data of the embodiment of the present application, to improve abnormal index
The accuracy of detection.Specific technical solution is as follows:
In a first aspect, providing a kind of detection method of abnormal data, which comprises
Using preset density clustering algorithm, determine that the corresponding part of each data object is close in set of data objects
Degree, the set of data objects includes multiple data objects, and a data object is collected by same history samples time point
Multiple operating index of one target device are constituted;
For each data object in the set of data objects, if the corresponding local density of the data object is greater than
Preset local density threshold, and centered on the data object, preset truncation distance in the region of radius, office is not present
Portion's density is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data for including according to the cluster are determined
Object updates the cluster;
The data object that will not belong to either cluster, as abnormal data object.
Optionally, described to use preset density clustering algorithm, determine that each data object is right respectively in set of data objects
The local density answered, comprising:
For each data object in the set of data objects, determine between the data object and other data objects
Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data
The corresponding local density of object.
It is optionally, described that cluster is created using the data object as cluster centre, comprising:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object
The cluster.
Optionally, each cluster for creation, determines the core data object that the cluster includes, and include according to the cluster
Core data object update the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part
The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined
Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster
Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold,
The numerical value of the contraction factor is less than 1.
Second aspect, provides a kind of detection device of abnormal data, and described device comprises determining that module, creation module
And update module;
The determining module determines each data object in set of data objects for using preset density clustering algorithm
Corresponding local density, the set of data objects include multiple data objects, and a data object is adopted by same history
Multiple operating index of a sample time point collected target device are constituted;
The creation module, each data object for being directed in the set of data objects, if the data object
Corresponding local density be greater than preset local density threshold, and centered on the data object, preset truncation distance for partly
In the region of diameter, there is no the data objects that local density is greater than the local density of the data object, then with data object work
Cluster is created for cluster centre;
The update module determines the core data object that the cluster includes, and according to this for each cluster for creation
The core data object that cluster includes updates the cluster;
The determining module is also used to will not belong to the data object of either cluster, as abnormal data object.
Optionally, the determining module, is specifically used for:
For each data object in the set of data objects, determine between the data object and other data objects
Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data
The corresponding local density of object.
Optionally, the creation module, is specifically used for:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object
The cluster.
Optionally, the update module, is specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part
The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined
Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster
Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, described device further include: computing module;
The computing module obtains described for calculating the product of the local density threshold Yu preset contraction factor
Core local density threshold, the numerical value of the contraction factor is less than 1.
The third aspect provides a kind of electronic equipment, including processor, communication interface, memory and communication bus,
In, processor, communication interface, memory completes mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of first aspect.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
Computer program realizes first aspect any method and step when the computer program is executed by processor.
5th aspect, provides a kind of computer program product comprising instruction, when run on a computer, so that
Computer executes any method of above-mentioned first aspect.
A kind of detection method and device of abnormal data provided by the embodiments of the present application can be first using pre- in this method
If density clustering algorithm, determine the corresponding local density of each data object, set of data objects in set of data objects
Including multiple data objects, a data object by the collected target device of same history samples time point multiple fortune
Row index is constituted.For each data object in set of data objects, if the corresponding local density of the data object is greater than
Preset local density threshold, and centered on the data object, preset truncation distance in the region of radius, office is not present
Portion's density is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre.Needle
To each cluster of creation, the core data object that the cluster includes is determined, and updating according to the core data object that the cluster includes should
Cluster will not belong to the data object of either cluster, as abnormal data object.In this way, can will be area of the distance as radius be truncated
In domain, the maximum data object of local density be determined as cluster centre, the accuracy rate for the cluster centre determined is higher, to mention
The high accuracy of detection abnormal data.
Certainly, implement the application any product or method it is not absolutely required to and meanwhile reach all the above excellent
Point.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the architecture diagram of operational system provided by the embodiments of the present application;
Fig. 2 is a kind of flow chart of the detection method of abnormal data provided by the embodiments of the present application;
Fig. 3 is a kind of structural schematic diagram of the detection device of abnormal data provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
The embodiment of the present application provides a kind of detection method of abnormal data, and this method can be applied to operational system, tool
Body, this method can be applied to O&M server or service server in operational system.The embodiment of the present application is with O&M
It is introduced for server, other situations are similar therewith.Fig. 1 is the architecture diagram of operational system provided by the embodiments of the present application,
As shown in Figure 1, the operational system includes O&M server and multiple service servers.O&M server and each service server connect
It connects, for acquiring the operating index of each service server according to the preset sampling period.
Below in conjunction with specific embodiment, to a kind of detection method progress of abnormal data provided by the embodiments of the present application
Detailed description, as shown in Figure 2, the specific steps are as follows:
Step 201, using preset density clustering algorithm, determine that each data object is corresponding in set of data objects
Local density.
Wherein, set of data objects includes multiple data objects, and a data object is adopted by same history samples time point
Multiple operating index of the target device collected are constituted.
In an implementation, O&M server can acquire the operating index of target device according to the preset sampling period.Operation
Index may include service indication and equipment index, and service indication refers to scale, the index of quality of reflection equipment.For example, net
Page response time, web page access amount, connection error quantity etc..Equipment index refers to the index of reflection equipment state.For example, CPU
Utilization rate, memory usage, magnetic disc i/o, network interface card throughput etc..Operating index can also include other kinds of index, this Shen
Please embodiment be not construed as limiting.As shown in Table 1, table one is O&M server in the collected target device of different sampling stages
Operating index.
Table one
It, can be by the collected target of same sampling time point after O&M collection of server to the operating index of target device
Multiple operating index of equipment form a data object, and (i-th of data object can be denoted as subTrai), and by multiple data
Object composition set of data objects (can be denoted as T { subTra1, subTra2…subTraj…subTrai…subTran})。
The corresponding multiple data pair of destination network device are collected in the available default history duration of O&M server
As multiple data object constitutes the corresponding set of data objects of target device, to carry out subsequent processing.
After O&M server obtains the corresponding set of data objects of target device, each data object pair may further determine that
(the corresponding local density of i-th of data object can be denoted as δ for the local density answeredi)。
Optionally, O&M server determine each data object it is corresponding local density the specific process is as follows: for number
According to each data object in object set, the distance between the data object and other data objects are determined, it will be with the data
The distance between object is less than the number of the data object of preset truncation distance, close as the corresponding part of the data object
Degree.
In an implementation, truncation distance is previously stored in O&M server (can be denoted as dc).The truncation distance can be by
Technical staff rule of thumb sets.For each data object in set of data objects, O&M server can be calculated
(the distance between i-th of data object and j-th of data object can be with for the distance between the data object and other data objects
It is denoted as dij).Wherein, which can be Euclidean distance, correspondingly, O&M server can be calculated according to preset Euclidean distance
Formula calculates dij。
Then, O&M server can determine that the distance between the data object is less than the data of preset truncation distance
Object, and then the number for the data object determined is counted, using the number as the corresponding local density of the data object.O&M
Server determines shown in the formula such as formula (1) and formula (2) of the corresponding local density of each data object.
Wherein, δiFor the local density of i-th of data object, dijFor i-th of data object to j-th data object away from
From dcFor distance is truncated.
Step 202, for each data object in set of data objects, if the corresponding local density of the data object
Greater than preset local density threshold, and centered on the data object, preset truncation distance is do not deposit in the region of radius
It is greater than the data object of the local density of the data object in local density, then is created using the data object as cluster centre
Cluster.
In an implementation, local density threshold can also be previously stored in O&M server.The local density threshold can be with
It is rule of thumb configured by technical staff.For each data object in set of data objects, O&M server is somebody's turn to do
After the local density of data object, it is default can further to judge whether the corresponding local density of the data object is greater than or equal to
Local density threshold.If the corresponding local density of the data object is greater than or equal to preset local density threshold, can
To further determine that centered on the data object, preset truncation distance is the data object in the region of radius, then, root
According to the local density of each data object in the region, it is close greater than the part of the data object to judge whether there is local density
The data object of degree.It is greater than the data object of the local density of the data object if there is no local density, then illustrates the number
According to the maximum data object of local density that object is in the region, cluster is created using the data object as cluster centre.Conversely,
If the corresponding local density of the data object is less than preset local density threshold, alternatively, there are local densities to be greater than the number
According to the data object of the local density of object, then illustrate the data object not and be the maximum data pair of local density in the region
As the data object is not cluster centre.
Optionally, O&M server creates the treatment process of cluster using the data object as cluster centre are as follows: will be with the number
According to object be the center of circle, the preset distance that is truncated is that the data object in the range of radius is divided to the cluster.
In an implementation, O&M server determines the data object as that can will be circle with the data object after cluster centre
The heart, preset truncation distance are that the data object in the range of radius is divided to the cluster, and poly- also i.e. by set of data objects
The data object that the distance between class center is less than or equal to preset truncation distance is divided in the corresponding cluster of the cluster centre.
Step 203, it for each cluster of creation, determines the core data object that the cluster includes, and includes according to the cluster
Core data object updates the cluster.
In an implementation, for each cluster of creation, O&M server can also judge that the data object for including in the cluster (removes
Other data objects except cluster centre) in whether there is core data object.If there is core data object, then O&M
Server can update the cluster according to the core data object that the cluster includes.
Optionally, for each cluster of creation, O&M server determines the core data object that the cluster includes, and according to this
The treatment process that the core data object that cluster includes updates the cluster is as follows:
Local density in the data object that the cluster includes, is greater than preset core for each cluster of creation by step 1
The data object of heart local density threshold is determined as core data object.
In an implementation, core local density threshold can be previously stored in O&M server.The core local density threshold
Value can be rule of thumb configured by technical staff, alternatively, O&M server can calculate local density threshold with it is preset
The product of contraction factor, the product are core local density threshold.Wherein, the numerical value of contraction factor is less than 1, rule of thumb,
The value range of contraction factor can be 0.8-0.9.For example, local density threshold is 10, preset contraction factor is 0.8, then
Core local density threshold is 8.
For each cluster of creation, O&M server can judge each data object in the data object that the cluster includes
Whether corresponding local density is greater than or equal to core local density threshold.If the corresponding local density of a certain data object is big
In or equal to core local density threshold, then O&M server can be determined that the data object is core data object.
Step 2 will be using the core data object as the center of circle, preset section for each core data object determined
Turn-off is divided to the cluster from for the data object in the range of radius (radius may be other empirical values), and is divided to newly
Continue to determine core data object in the data object of the cluster to continue to update the cluster, until the data object that the cluster includes is kept
Until constant.
In an implementation, after O&M server determines the core data object that the cluster includes, for each core determined
Heart data object (is properly termed as in the range of can will being radius as the center of circle, preset truncation distance using the core data object
Direct density is reachable) data object be divided to the cluster, that is, by set of data objects between core data object away from
From the data object being less than or equal to a distance from preset truncation, it is divided in the corresponding cluster of the cluster centre, obtains updated
Cluster.For updated cluster, O&M server can further judge whether newly-increased each data object is core data object.
If newly-increased a certain data object is core data object, O&M server can further will be with the core data object
The center of circle, preset truncation distance are that the data object in the range of radius is divided to the cluster, and so on, until the cluster includes
Until data object remains unchanged.
Step 204, the data object that will not belong to either cluster, as abnormal data object.
In an implementation, if a certain data object in set of data objects is not belonging to any one cluster, O&M service
Device can be determined that the data object is abnormal data object, and the data which is included, i.e. target device occur
Data when abnormal.O&M server can be with the mark of output abnormality data object and target device, so that operation maintenance personnel obtains
Know that target device is abnormal.
In this Shen embodiment, preset density clustering algorithm can be first used, determines each data pair in set of data objects
As corresponding local density, set of data objects includes multiple data objects, and a data object is by same history samples
Multiple operating index of a time point collected target device are constituted.For each data pair in set of data objects
As, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, it is pre-
If truncation distance be radius region in, there is no local density be greater than the data object local density data object,
Cluster is then created using the data object as cluster centre.For each cluster of creation, the core data object that the cluster includes is determined,
And the core data object for according to the cluster including updates the cluster, will not belong to the data object of either cluster, as abnormal data pair
As.In this way, can by using be truncated distance as in the region of radius, the maximum data object of local density be determined as cluster centre,
The accuracy rate for the cluster centre determined is higher, to improve the accuracy of detection abnormal data.
Based on the same technical idea, the embodiment of the present application also provides a kind of detection devices of abnormal data, such as Fig. 3 institute
Show, which comprises determining that module 310, creation module 320 and update module 330;
Determining module 310 determines each data object point in set of data objects for using preset density clustering algorithm
Not corresponding local density, set of data objects include multiple data objects, and a data object is by the same history samples time
Multiple operating index of the collected target device of point are constituted;
Creation module 320, each data object for being directed in set of data objects, if the data object is corresponding
Local density be greater than preset local density threshold, and centered on the data object, it is preset truncation distance for radius area
In domain, there is no the data objects that local density is greater than the local density of the data object, then using the data object as cluster
Center creates cluster;
Update module 330 determines the core data object that the cluster includes, and according to this for each cluster for creation
The core data object that cluster includes updates the cluster;
Determining module 310 is also used to will not belong to the data object of either cluster, as abnormal data object.
Optionally, determining module 310 are specifically used for:
For each data object in set of data objects, determine between the data object and other data objects away from
From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data
The corresponding local density of object.
Optionally, creation module 320 are specifically used for:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object
Cluster.
Optionally, update module 330 are specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part
The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, preset truncation distance for each core data object determined
It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster
According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, device further include: computing module;
Computing module obtains core local density for calculating the product of local density threshold Yu preset contraction factor
Threshold value, the numerical value of contraction factor is less than 1.
In this Shen embodiment, preset density clustering algorithm can be first used, determines each data pair in set of data objects
As corresponding local density, set of data objects includes multiple data objects, and a data object is by same history samples
Multiple operating index of a time point collected target device are constituted.For each data pair in set of data objects
As, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, it is pre-
If truncation distance be radius region in, there is no local density be greater than the data object local density data object,
Cluster is then created using the data object as cluster centre.For each cluster of creation, the core data object that the cluster includes is determined,
And the core data object for according to the cluster including updates the cluster, will not belong to the data object of either cluster, as abnormal data pair
As.In this way, can by using be truncated distance as in the region of radius, the maximum data object of local density be determined as cluster centre,
The accuracy rate for the cluster centre determined is higher, to improve the accuracy of detection abnormal data.
The embodiment of the present application also provides a kind of electronic equipment, as shown in figure 4, include processor 401, communication interface 402,
Memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 are complete by communication bus 404
At mutual communication,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes following steps:
Using preset density clustering algorithm, determine that the corresponding part of each data object is close in set of data objects
Degree, the set of data objects includes multiple data objects, and a data object is collected by same history samples time point
Multiple operating index of one target device are constituted;
For each data object in set of data objects, preset if the corresponding local density of the data object is greater than
Local density threshold, and centered on the data object, preset truncation distance in the region of radius, there is no part is close
Degree is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data for including according to the cluster are determined
Object updates the cluster;
The data object that will not belong to either cluster, as abnormal data object.
Optionally, described to use preset density clustering algorithm, determine that each data object is right respectively in set of data objects
The local density answered, comprising:
For each data object in the set of data objects, determine between the data object and other data objects
Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data
The corresponding local density of object.
It is optionally, described that cluster is created using the data object as cluster centre, comprising:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object
The cluster.
Optionally, each cluster for creation, determines the core data object that the cluster includes, and include according to the cluster
Core data object update the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part
The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined
Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster
Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold,
The numerical value of the contraction factor is less than 1.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (English: Peripheral
Component Interconnect, referred to as: PCI) bus or expanding the industrial standard structure (English: Extended Industry
Standard Architecture, referred to as: EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control
Bus processed etc..Only to be indicated with a thick line in figure convenient for indicating, it is not intended that an only bus or a type of total
Line.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (English: Random Access Memory, abbreviation: RAM), can also
To include nonvolatile memory (English: Non-Volatile Memory, abbreviation: NVM), for example, at least a disk storage
Device.Optionally, memory can also be that at least one is located remotely from the storage device of aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (English: Central Processing
Unit, referred to as: CPU), network processing unit (English: Network Processor, referred to as: NP) etc.;It can also be digital signal
Processor (English: Digital Signal Processing, abbreviation: DSP), specific integrated circuit (English: Application
Specific Integrated Circuit, referred to as: ASIC), field programmable gate array (English: Field-
Programmable Gate Array, referred to as: FPGA) either other programmable logic device, discrete gate or transistor logic
Device, discrete hardware components.
Based on the same technical idea, the embodiment of the present application also provides a kind of computer readable storage medium, the meters
Computer program is stored in calculation machine readable storage medium storing program for executing, the computer program realizes any of the above-described institute when being executed by processor
The detection method step for the abnormal data stated.
Based on the same technical idea, the embodiment of the present application also provides a kind of computer program product comprising instruction,
When run on a computer, so that the method that computer executes any of the above-described anomaly data detection.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device
Speech, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the part of embodiment of the method in place of correlation
Explanation.
The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all
Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application
It is interior.
Claims (10)
1. a kind of detection method of abnormal data, which is characterized in that the described method includes:
Using preset density clustering algorithm, the corresponding local density of each data object, institute in set of data objects are determined
Stating set of data objects includes multiple data objects, and a data object is by the collected mesh of same history samples time point
Multiple operating index of marking device are constituted;
For each data object in the set of data objects, preset if the corresponding local density of the data object is greater than
Local density threshold, and centered on the data object, preset truncation distance in the region of radius, there is no part is close
Degree is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data object for including according to the cluster are determined
Update the cluster;
The data object that will not belong to either cluster, as abnormal data object.
2. determining data the method according to claim 1, wherein described use preset density clustering algorithm
The corresponding local density of each data object in object set, comprising:
For each data object in the set of data objects, determine between the data object and other data objects away from
From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data object
Corresponding local density.
3. the method according to claim 1, wherein it is described using the data object as cluster centre create cluster,
Include:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object
Cluster.
4. method according to claim 1-3, which is characterized in that each cluster for creation, determining should
The core data object that cluster includes, and the core data object for including according to the cluster updates the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core local density
The data object of threshold value is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation distance for each core data object determined
It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster
According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold, described
The numerical value of contraction factor is less than 1.
6. a kind of detection device of abnormal data, which is characterized in that described device comprises determining that module, creation module and update
Module;
The determining module determines each data object difference in set of data objects for using preset density clustering algorithm
Corresponding local density, the set of data objects includes multiple data objects, when a data object is by same history samples
Between put a collected target device multiple operating index constitute;
The creation module, each data object for being directed in the set of data objects, if the data object is corresponding
Local density be greater than preset local density threshold, and centered on the data object, preset truncation distance be radius
In region, there is no local density be greater than the data object local density data object, then using the data object as gather
Class center creates cluster;
The update module determines the core data object that the cluster includes, and according to the cluster packet for each cluster for creation
The core data object contained updates the cluster;
The determining module is also used to will not belong to the data object of either cluster, as abnormal data object.
7. device according to claim 6, which is characterized in that the determining module is specifically used for:
For each data object in the set of data objects, determine between the data object and other data objects away from
From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data object
Corresponding local density.
8. device according to claim 6, which is characterized in that the creation module is specifically used for:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object
Cluster.
9. according to the described in any item devices of claim 6-8, which is characterized in that the update module is specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core local density
The data object of threshold value is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation distance for each core data object determined
It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster
According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
10. device according to claim 9, which is characterized in that described device further include: computing module;
The computing module obtains the core for calculating the product of the local density threshold Yu preset contraction factor
Local density threshold, the numerical value of the contraction factor is less than 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327595.4A CN110083475B (en) | 2019-04-23 | 2019-04-23 | Abnormal data detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327595.4A CN110083475B (en) | 2019-04-23 | 2019-04-23 | Abnormal data detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110083475A true CN110083475A (en) | 2019-08-02 |
CN110083475B CN110083475B (en) | 2022-10-25 |
Family
ID=67416157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910327595.4A Active CN110083475B (en) | 2019-04-23 | 2019-04-23 | Abnormal data detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083475B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125362A (en) * | 2019-12-23 | 2020-05-08 | 百度国际科技(深圳)有限公司 | Abnormal text determination method and device, electronic equipment and medium |
CN112468329A (en) * | 2020-11-13 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Method, device, equipment and readable medium for batch grouping management of servers |
WO2021109314A1 (en) * | 2019-12-06 | 2021-06-10 | 网宿科技股份有限公司 | Method, system and device for detecting abnormal data |
CN113343056A (en) * | 2021-05-21 | 2021-09-03 | 北京市燃气集团有限责任公司 | Method and device for detecting abnormal gas consumption of user |
CN113542060A (en) * | 2021-07-07 | 2021-10-22 | 电子科技大学中山学院 | Abnormal equipment detection method based on equipment communication data characteristics |
CN116882850A (en) * | 2023-09-08 | 2023-10-13 | 山东科技大学 | Garden data intelligent management method and system based on big data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006039970A (en) * | 2004-07-27 | 2006-02-09 | Kokuritsu Iyakuhin Shokuhin Eisei Kenkyusho | Device for splitting high dimensional data into blocks |
CN101536591A (en) * | 2006-10-30 | 2009-09-16 | Lm爱立信电话有限公司 | Extended clustering for improved positioning |
CN103336781A (en) * | 2013-05-29 | 2013-10-02 | 江苏大学 | Medical image clustering method |
CN104484600A (en) * | 2014-11-18 | 2015-04-01 | 中国科学院深圳先进技术研究院 | Intrusion detection method and device based on improved density clustering |
CN105577679A (en) * | 2016-01-14 | 2016-05-11 | 华东师范大学 | Method for detecting anomaly traffic based on feature selection and density peak clustering |
CN107563400A (en) * | 2016-06-30 | 2018-01-09 | 中国矿业大学 | A kind of density peaks clustering method and system based on grid |
CN108537276A (en) * | 2018-04-09 | 2018-09-14 | 广东工业大学 | A kind of choosing method of cluster centre, device and medium |
-
2019
- 2019-04-23 CN CN201910327595.4A patent/CN110083475B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006039970A (en) * | 2004-07-27 | 2006-02-09 | Kokuritsu Iyakuhin Shokuhin Eisei Kenkyusho | Device for splitting high dimensional data into blocks |
CN101536591A (en) * | 2006-10-30 | 2009-09-16 | Lm爱立信电话有限公司 | Extended clustering for improved positioning |
CN103336781A (en) * | 2013-05-29 | 2013-10-02 | 江苏大学 | Medical image clustering method |
CN104484600A (en) * | 2014-11-18 | 2015-04-01 | 中国科学院深圳先进技术研究院 | Intrusion detection method and device based on improved density clustering |
CN105577679A (en) * | 2016-01-14 | 2016-05-11 | 华东师范大学 | Method for detecting anomaly traffic based on feature selection and density peak clustering |
CN107563400A (en) * | 2016-06-30 | 2018-01-09 | 中国矿业大学 | A kind of density peaks clustering method and system based on grid |
CN108537276A (en) * | 2018-04-09 | 2018-09-14 | 广东工业大学 | A kind of choosing method of cluster centre, device and medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021109314A1 (en) * | 2019-12-06 | 2021-06-10 | 网宿科技股份有限公司 | Method, system and device for detecting abnormal data |
CN111125362A (en) * | 2019-12-23 | 2020-05-08 | 百度国际科技(深圳)有限公司 | Abnormal text determination method and device, electronic equipment and medium |
CN111125362B (en) * | 2019-12-23 | 2023-06-16 | 百度国际科技(深圳)有限公司 | Abnormal text determination method and device, electronic equipment and medium |
CN112468329A (en) * | 2020-11-13 | 2021-03-09 | 苏州浪潮智能科技有限公司 | Method, device, equipment and readable medium for batch grouping management of servers |
CN113343056A (en) * | 2021-05-21 | 2021-09-03 | 北京市燃气集团有限责任公司 | Method and device for detecting abnormal gas consumption of user |
CN113542060A (en) * | 2021-07-07 | 2021-10-22 | 电子科技大学中山学院 | Abnormal equipment detection method based on equipment communication data characteristics |
CN113542060B (en) * | 2021-07-07 | 2023-03-07 | 电子科技大学中山学院 | Abnormal equipment detection method based on equipment communication data characteristics |
CN116882850A (en) * | 2023-09-08 | 2023-10-13 | 山东科技大学 | Garden data intelligent management method and system based on big data |
CN116882850B (en) * | 2023-09-08 | 2023-12-12 | 山东科技大学 | Garden data intelligent management method and system based on big data |
Also Published As
Publication number | Publication date |
---|---|
CN110083475B (en) | 2022-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083475A (en) | A kind of detection method and device of abnormal data | |
CN109587001A (en) | A kind of performance indicator method for detecting abnormality and device | |
US8930223B2 (en) | Patient cohort matching | |
CN110113226A (en) | A kind of method and device of detection device exception | |
CN109558295A (en) | A kind of performance indicator method for detecting abnormality and device | |
JP6246357B2 (en) | Building management apparatus, wide area management system, data acquisition method, and program | |
CN114217948A (en) | Performance monitoring in distributed storage systems | |
CN110198313A (en) | A kind of method and device of strategy generating | |
CN107276851B (en) | Node abnormity detection method and device, network node and console | |
CN108345601A (en) | Search result ordering method and device | |
CN107992738A (en) | A kind of account logs in method for detecting abnormality, device and electronic equipment | |
CN110489757A (en) | A kind of keyword extracting method and device | |
CN110516752A (en) | Clustering cluster method for evaluating quality, device, equipment and storage medium | |
CN108021713B (en) | Document clustering method and device | |
CN110427259A (en) | A kind of task processing method and device | |
CN111540202B (en) | Similar bayonet determining method and device, electronic equipment and readable storage medium | |
CN115932144B (en) | Chromatograph performance detection method, chromatograph performance detection device, chromatograph performance detection equipment and computer medium | |
CN109522275A (en) | Label method for digging, electronic equipment and the storage medium of content are produced based on user | |
CN117113247A (en) | Drainage system abnormality monitoring method, equipment and storage medium based on two-classification and clustering algorithm | |
CN108959415A (en) | A kind of exception dimension localization method, device and electronic equipment | |
CN109408369A (en) | A kind of system detection method, device and electronic equipment | |
WO2018125419A1 (en) | Automatic prediction of patient length of stay and detection of medical center readmission diagnoses | |
WO2021184588A1 (en) | Cluster optimization method and device, server, and medium | |
CN110309257A (en) | A kind of file read-write deployment method and device | |
CN109828970B (en) | Information processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |