CN110083475A - A kind of detection method and device of abnormal data - Google Patents

A kind of detection method and device of abnormal data Download PDF

Info

Publication number
CN110083475A
CN110083475A CN201910327595.4A CN201910327595A CN110083475A CN 110083475 A CN110083475 A CN 110083475A CN 201910327595 A CN201910327595 A CN 201910327595A CN 110083475 A CN110083475 A CN 110083475A
Authority
CN
China
Prior art keywords
data object
cluster
data
local density
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910327595.4A
Other languages
Chinese (zh)
Other versions
CN110083475B (en
Inventor
孙尚勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Security Technologies Co Ltd
Original Assignee
New H3C Security Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Security Technologies Co Ltd filed Critical New H3C Security Technologies Co Ltd
Priority to CN201910327595.4A priority Critical patent/CN110083475B/en
Publication of CN110083475A publication Critical patent/CN110083475A/en
Application granted granted Critical
Publication of CN110083475B publication Critical patent/CN110083475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The embodiment of the present application provides a kind of detection method and device of abnormal data, is related to field of computer technology.This method comprises: using preset density clustering algorithm, the corresponding local density of each data object in set of data objects is determined;For each data object in set of data objects, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, preset truncation distance is in the region of radius, there is no the data objects that local density is greater than the local density of the data object, then cluster is created using the data object as cluster centre;For each cluster of creation, the core data object that the cluster includes is determined, and the core data object for including according to the cluster updates the cluster;The data object that will not belong to either cluster, as abnormal data object.The accuracy of abnormal index detection can be improved using the application.

Description

A kind of detection method and device of abnormal data
Technical field
This application involves field of computer technology, more particularly to a kind of detection method and device of abnormal data.
Background technique
Currently, people generally use the mode of the operating index of monitoring device, to determine the operating status of equipment.Specifically , abnormal data can be determined in each operating index, then further according to may deposit in the abnormal data analytical equipment of each index The problem of.Wherein, operating index may include service indication and equipment index, service indication refer to reflection equipment scale, The index of quality, for example, webpage response time, web page access amount, connection error quantity etc.;Equipment index refers to reflection equipment shape The index of state, for example, central processing unit (English: Central Processing Unit, abbreviation: CPU) utilization rate, memory make With rate, disk input/output (English: Input/Output, abbreviation: I/O), network interface card throughput etc..
In the related art, density peaks clustering algorithm is common one of the algorithm for determining abnormal data, specific to locate Reason process are as follows: obtain the set of data objects of certain operating index, include multiple data objects, data pair in the set of data objects As for according to the numerical value of the collected operating index of default sampling period.Then, for stochastic searching in set of data objects Data object is determined using the data object as number (the i.e. office for the data object for including in the pre-set density radius in the center of circle Portion's density).If the local density is not less than preset density threshold, it is determined that the data object is core data object.So It afterwards, is the data object composition cluster in radius by preset density radius using core data object as cluster centre.For The core data object is the center of circle, within the scope of preset density radius (also referred to as by each core data object for including in the cluster Direct density is reachable) data object be divided to the cluster, until the data object in the cluster is not further added by.Based on above-mentioned place Reason, can be generated at least one cluster.Later, the data object of any cluster will be not belonging in the set of data objects as being abnormal Data object.
It is first data pair for being not less than preset density threshold with the local density determined in above-mentioned technical proposal As the cluster centre as cluster.However, the data object bigger there is likely to be local density in the range, that is, the number According to object it is possible that not being real cluster centre.Since the selection of cluster centre directly affects the accuracy of cluster result, from And cause the accuracy rate of anomaly data detection lower.
Summary of the invention
The detection method and device for being designed to provide a kind of abnormal data of the embodiment of the present application, to improve abnormal index The accuracy of detection.Specific technical solution is as follows:
In a first aspect, providing a kind of detection method of abnormal data, which comprises
Using preset density clustering algorithm, determine that the corresponding part of each data object is close in set of data objects Degree, the set of data objects includes multiple data objects, and a data object is collected by same history samples time point Multiple operating index of one target device are constituted;
For each data object in the set of data objects, if the corresponding local density of the data object is greater than Preset local density threshold, and centered on the data object, preset truncation distance in the region of radius, office is not present Portion's density is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data for including according to the cluster are determined Object updates the cluster;
The data object that will not belong to either cluster, as abnormal data object.
Optionally, described to use preset density clustering algorithm, determine that each data object is right respectively in set of data objects The local density answered, comprising:
For each data object in the set of data objects, determine between the data object and other data objects Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data The corresponding local density of object.
It is optionally, described that cluster is created using the data object as cluster centre, comprising:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object The cluster.
Optionally, each cluster for creation, determines the core data object that the cluster includes, and include according to the cluster Core data object update the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold, The numerical value of the contraction factor is less than 1.
Second aspect, provides a kind of detection device of abnormal data, and described device comprises determining that module, creation module And update module;
The determining module determines each data object in set of data objects for using preset density clustering algorithm Corresponding local density, the set of data objects include multiple data objects, and a data object is adopted by same history Multiple operating index of a sample time point collected target device are constituted;
The creation module, each data object for being directed in the set of data objects, if the data object Corresponding local density be greater than preset local density threshold, and centered on the data object, preset truncation distance for partly In the region of diameter, there is no the data objects that local density is greater than the local density of the data object, then with data object work Cluster is created for cluster centre;
The update module determines the core data object that the cluster includes, and according to this for each cluster for creation The core data object that cluster includes updates the cluster;
The determining module is also used to will not belong to the data object of either cluster, as abnormal data object.
Optionally, the determining module, is specifically used for:
For each data object in the set of data objects, determine between the data object and other data objects Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data The corresponding local density of object.
Optionally, the creation module, is specifically used for:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object The cluster.
Optionally, the update module, is specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, described device further include: computing module;
The computing module obtains described for calculating the product of the local density threshold Yu preset contraction factor Core local density threshold, the numerical value of the contraction factor is less than 1.
The third aspect provides a kind of electronic equipment, including processor, communication interface, memory and communication bus, In, processor, communication interface, memory completes mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of first aspect.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium Computer program realizes first aspect any method and step when the computer program is executed by processor.
5th aspect, provides a kind of computer program product comprising instruction, when run on a computer, so that Computer executes any method of above-mentioned first aspect.
A kind of detection method and device of abnormal data provided by the embodiments of the present application can be first using pre- in this method If density clustering algorithm, determine the corresponding local density of each data object, set of data objects in set of data objects Including multiple data objects, a data object by the collected target device of same history samples time point multiple fortune Row index is constituted.For each data object in set of data objects, if the corresponding local density of the data object is greater than Preset local density threshold, and centered on the data object, preset truncation distance in the region of radius, office is not present Portion's density is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre.Needle To each cluster of creation, the core data object that the cluster includes is determined, and updating according to the core data object that the cluster includes should Cluster will not belong to the data object of either cluster, as abnormal data object.In this way, can will be area of the distance as radius be truncated In domain, the maximum data object of local density be determined as cluster centre, the accuracy rate for the cluster centre determined is higher, to mention The high accuracy of detection abnormal data.
Certainly, implement the application any product or method it is not absolutely required to and meanwhile reach all the above excellent Point.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the architecture diagram of operational system provided by the embodiments of the present application;
Fig. 2 is a kind of flow chart of the detection method of abnormal data provided by the embodiments of the present application;
Fig. 3 is a kind of structural schematic diagram of the detection device of abnormal data provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall in the protection scope of this application.
The embodiment of the present application provides a kind of detection method of abnormal data, and this method can be applied to operational system, tool Body, this method can be applied to O&M server or service server in operational system.The embodiment of the present application is with O&M It is introduced for server, other situations are similar therewith.Fig. 1 is the architecture diagram of operational system provided by the embodiments of the present application, As shown in Figure 1, the operational system includes O&M server and multiple service servers.O&M server and each service server connect It connects, for acquiring the operating index of each service server according to the preset sampling period.
Below in conjunction with specific embodiment, to a kind of detection method progress of abnormal data provided by the embodiments of the present application Detailed description, as shown in Figure 2, the specific steps are as follows:
Step 201, using preset density clustering algorithm, determine that each data object is corresponding in set of data objects Local density.
Wherein, set of data objects includes multiple data objects, and a data object is adopted by same history samples time point Multiple operating index of the target device collected are constituted.
In an implementation, O&M server can acquire the operating index of target device according to the preset sampling period.Operation Index may include service indication and equipment index, and service indication refers to scale, the index of quality of reflection equipment.For example, net Page response time, web page access amount, connection error quantity etc..Equipment index refers to the index of reflection equipment state.For example, CPU Utilization rate, memory usage, magnetic disc i/o, network interface card throughput etc..Operating index can also include other kinds of index, this Shen Please embodiment be not construed as limiting.As shown in Table 1, table one is O&M server in the collected target device of different sampling stages Operating index.
Table one
It, can be by the collected target of same sampling time point after O&M collection of server to the operating index of target device Multiple operating index of equipment form a data object, and (i-th of data object can be denoted as subTrai), and by multiple data Object composition set of data objects (can be denoted as T { subTra1, subTra2…subTraj…subTrai…subTran})。
The corresponding multiple data pair of destination network device are collected in the available default history duration of O&M server As multiple data object constitutes the corresponding set of data objects of target device, to carry out subsequent processing.
After O&M server obtains the corresponding set of data objects of target device, each data object pair may further determine that (the corresponding local density of i-th of data object can be denoted as δ for the local density answeredi)。
Optionally, O&M server determine each data object it is corresponding local density the specific process is as follows: for number According to each data object in object set, the distance between the data object and other data objects are determined, it will be with the data The distance between object is less than the number of the data object of preset truncation distance, close as the corresponding part of the data object Degree.
In an implementation, truncation distance is previously stored in O&M server (can be denoted as dc).The truncation distance can be by Technical staff rule of thumb sets.For each data object in set of data objects, O&M server can be calculated (the distance between i-th of data object and j-th of data object can be with for the distance between the data object and other data objects It is denoted as dij).Wherein, which can be Euclidean distance, correspondingly, O&M server can be calculated according to preset Euclidean distance Formula calculates dij
Then, O&M server can determine that the distance between the data object is less than the data of preset truncation distance Object, and then the number for the data object determined is counted, using the number as the corresponding local density of the data object.O&M Server determines shown in the formula such as formula (1) and formula (2) of the corresponding local density of each data object.
Wherein, δiFor the local density of i-th of data object, dijFor i-th of data object to j-th data object away from From dcFor distance is truncated.
Step 202, for each data object in set of data objects, if the corresponding local density of the data object Greater than preset local density threshold, and centered on the data object, preset truncation distance is do not deposit in the region of radius It is greater than the data object of the local density of the data object in local density, then is created using the data object as cluster centre Cluster.
In an implementation, local density threshold can also be previously stored in O&M server.The local density threshold can be with It is rule of thumb configured by technical staff.For each data object in set of data objects, O&M server is somebody's turn to do After the local density of data object, it is default can further to judge whether the corresponding local density of the data object is greater than or equal to Local density threshold.If the corresponding local density of the data object is greater than or equal to preset local density threshold, can To further determine that centered on the data object, preset truncation distance is the data object in the region of radius, then, root According to the local density of each data object in the region, it is close greater than the part of the data object to judge whether there is local density The data object of degree.It is greater than the data object of the local density of the data object if there is no local density, then illustrates the number According to the maximum data object of local density that object is in the region, cluster is created using the data object as cluster centre.Conversely, If the corresponding local density of the data object is less than preset local density threshold, alternatively, there are local densities to be greater than the number According to the data object of the local density of object, then illustrate the data object not and be the maximum data pair of local density in the region As the data object is not cluster centre.
Optionally, O&M server creates the treatment process of cluster using the data object as cluster centre are as follows: will be with the number According to object be the center of circle, the preset distance that is truncated is that the data object in the range of radius is divided to the cluster.
In an implementation, O&M server determines the data object as that can will be circle with the data object after cluster centre The heart, preset truncation distance are that the data object in the range of radius is divided to the cluster, and poly- also i.e. by set of data objects The data object that the distance between class center is less than or equal to preset truncation distance is divided in the corresponding cluster of the cluster centre.
Step 203, it for each cluster of creation, determines the core data object that the cluster includes, and includes according to the cluster Core data object updates the cluster.
In an implementation, for each cluster of creation, O&M server can also judge that the data object for including in the cluster (removes Other data objects except cluster centre) in whether there is core data object.If there is core data object, then O&M Server can update the cluster according to the core data object that the cluster includes.
Optionally, for each cluster of creation, O&M server determines the core data object that the cluster includes, and according to this The treatment process that the core data object that cluster includes updates the cluster is as follows:
Local density in the data object that the cluster includes, is greater than preset core for each cluster of creation by step 1 The data object of heart local density threshold is determined as core data object.
In an implementation, core local density threshold can be previously stored in O&M server.The core local density threshold Value can be rule of thumb configured by technical staff, alternatively, O&M server can calculate local density threshold with it is preset The product of contraction factor, the product are core local density threshold.Wherein, the numerical value of contraction factor is less than 1, rule of thumb, The value range of contraction factor can be 0.8-0.9.For example, local density threshold is 10, preset contraction factor is 0.8, then Core local density threshold is 8.
For each cluster of creation, O&M server can judge each data object in the data object that the cluster includes Whether corresponding local density is greater than or equal to core local density threshold.If the corresponding local density of a certain data object is big In or equal to core local density threshold, then O&M server can be determined that the data object is core data object.
Step 2 will be using the core data object as the center of circle, preset section for each core data object determined Turn-off is divided to the cluster from for the data object in the range of radius (radius may be other empirical values), and is divided to newly Continue to determine core data object in the data object of the cluster to continue to update the cluster, until the data object that the cluster includes is kept Until constant.
In an implementation, after O&M server determines the core data object that the cluster includes, for each core determined Heart data object (is properly termed as in the range of can will being radius as the center of circle, preset truncation distance using the core data object Direct density is reachable) data object be divided to the cluster, that is, by set of data objects between core data object away from From the data object being less than or equal to a distance from preset truncation, it is divided in the corresponding cluster of the cluster centre, obtains updated Cluster.For updated cluster, O&M server can further judge whether newly-increased each data object is core data object. If newly-increased a certain data object is core data object, O&M server can further will be with the core data object The center of circle, preset truncation distance are that the data object in the range of radius is divided to the cluster, and so on, until the cluster includes Until data object remains unchanged.
Step 204, the data object that will not belong to either cluster, as abnormal data object.
In an implementation, if a certain data object in set of data objects is not belonging to any one cluster, O&M service Device can be determined that the data object is abnormal data object, and the data which is included, i.e. target device occur Data when abnormal.O&M server can be with the mark of output abnormality data object and target device, so that operation maintenance personnel obtains Know that target device is abnormal.
In this Shen embodiment, preset density clustering algorithm can be first used, determines each data pair in set of data objects As corresponding local density, set of data objects includes multiple data objects, and a data object is by same history samples Multiple operating index of a time point collected target device are constituted.For each data pair in set of data objects As, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, it is pre- If truncation distance be radius region in, there is no local density be greater than the data object local density data object, Cluster is then created using the data object as cluster centre.For each cluster of creation, the core data object that the cluster includes is determined, And the core data object for according to the cluster including updates the cluster, will not belong to the data object of either cluster, as abnormal data pair As.In this way, can by using be truncated distance as in the region of radius, the maximum data object of local density be determined as cluster centre, The accuracy rate for the cluster centre determined is higher, to improve the accuracy of detection abnormal data.
Based on the same technical idea, the embodiment of the present application also provides a kind of detection devices of abnormal data, such as Fig. 3 institute Show, which comprises determining that module 310, creation module 320 and update module 330;
Determining module 310 determines each data object point in set of data objects for using preset density clustering algorithm Not corresponding local density, set of data objects include multiple data objects, and a data object is by the same history samples time Multiple operating index of the collected target device of point are constituted;
Creation module 320, each data object for being directed in set of data objects, if the data object is corresponding Local density be greater than preset local density threshold, and centered on the data object, it is preset truncation distance for radius area In domain, there is no the data objects that local density is greater than the local density of the data object, then using the data object as cluster Center creates cluster;
Update module 330 determines the core data object that the cluster includes, and according to this for each cluster for creation The core data object that cluster includes updates the cluster;
Determining module 310 is also used to will not belong to the data object of either cluster, as abnormal data object.
Optionally, determining module 310 are specifically used for:
For each data object in set of data objects, determine between the data object and other data objects away from From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data The corresponding local density of object.
Optionally, creation module 320 are specifically used for:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object Cluster.
Optionally, update module 330 are specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, preset truncation distance for each core data object determined It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, device further include: computing module;
Computing module obtains core local density for calculating the product of local density threshold Yu preset contraction factor Threshold value, the numerical value of contraction factor is less than 1.
In this Shen embodiment, preset density clustering algorithm can be first used, determines each data pair in set of data objects As corresponding local density, set of data objects includes multiple data objects, and a data object is by same history samples Multiple operating index of a time point collected target device are constituted.For each data pair in set of data objects As, if the corresponding local density of the data object is greater than preset local density threshold, and centered on the data object, it is pre- If truncation distance be radius region in, there is no local density be greater than the data object local density data object, Cluster is then created using the data object as cluster centre.For each cluster of creation, the core data object that the cluster includes is determined, And the core data object for according to the cluster including updates the cluster, will not belong to the data object of either cluster, as abnormal data pair As.In this way, can by using be truncated distance as in the region of radius, the maximum data object of local density be determined as cluster centre, The accuracy rate for the cluster centre determined is higher, to improve the accuracy of detection abnormal data.
The embodiment of the present application also provides a kind of electronic equipment, as shown in figure 4, include processor 401, communication interface 402, Memory 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 are complete by communication bus 404 At mutual communication,
Memory 403, for storing computer program;
Processor 401 when for executing the program stored on memory 403, realizes following steps:
Using preset density clustering algorithm, determine that the corresponding part of each data object is close in set of data objects Degree, the set of data objects includes multiple data objects, and a data object is collected by same history samples time point Multiple operating index of one target device are constituted;
For each data object in set of data objects, preset if the corresponding local density of the data object is greater than Local density threshold, and centered on the data object, preset truncation distance in the region of radius, there is no part is close Degree is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data for including according to the cluster are determined Object updates the cluster;
The data object that will not belong to either cluster, as abnormal data object.
Optionally, described to use preset density clustering algorithm, determine that each data object is right respectively in set of data objects The local density answered, comprising:
For each data object in the set of data objects, determine between the data object and other data objects Distance;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data The corresponding local density of object.
It is optionally, described that cluster is created using the data object as cluster centre, comprising:
It will be that the data object in the range of radius is divided to by the center of circle, the preset distance that is truncated of the data object The cluster.
Optionally, each cluster for creation, determines the core data object that the cluster includes, and include according to the cluster Core data object update the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core part The data object of density threshold is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation for each core data object determined Distance is that the data object in the range of radius is divided to the cluster, and continues to determine core in the data object for being newly divided to the cluster Heart data object is to continue to update the cluster, until data object that the cluster includes remains unchanged.
Optionally, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold, The numerical value of the contraction factor is less than 1.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (English: Peripheral Component Interconnect, referred to as: PCI) bus or expanding the industrial standard structure (English: Extended Industry Standard Architecture, referred to as: EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control Bus processed etc..Only to be indicated with a thick line in figure convenient for indicating, it is not intended that an only bus or a type of total Line.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (English: Random Access Memory, abbreviation: RAM), can also To include nonvolatile memory (English: Non-Volatile Memory, abbreviation: NVM), for example, at least a disk storage Device.Optionally, memory can also be that at least one is located remotely from the storage device of aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (English: Central Processing Unit, referred to as: CPU), network processing unit (English: Network Processor, referred to as: NP) etc.;It can also be digital signal Processor (English: Digital Signal Processing, abbreviation: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC), field programmable gate array (English: Field- Programmable Gate Array, referred to as: FPGA) either other programmable logic device, discrete gate or transistor logic Device, discrete hardware components.
Based on the same technical idea, the embodiment of the present application also provides a kind of computer readable storage medium, the meters Computer program is stored in calculation machine readable storage medium storing program for executing, the computer program realizes any of the above-described institute when being executed by processor The detection method step for the abnormal data stated.
Based on the same technical idea, the embodiment of the present application also provides a kind of computer program product comprising instruction, When run on a computer, so that the method that computer executes any of the above-described anomaly data detection.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device Speech, since it is substantially similar to the method embodiment, so being described relatively simple, referring to the part of embodiment of the method in place of correlation Explanation.
The foregoing is merely the preferred embodiments of the application, are not intended to limit the protection scope of the application.It is all Any modification, equivalent replacement, improvement and so within spirit herein and principle are all contained in the protection scope of the application It is interior.

Claims (10)

1. a kind of detection method of abnormal data, which is characterized in that the described method includes:
Using preset density clustering algorithm, the corresponding local density of each data object, institute in set of data objects are determined Stating set of data objects includes multiple data objects, and a data object is by the collected mesh of same history samples time point Multiple operating index of marking device are constituted;
For each data object in the set of data objects, preset if the corresponding local density of the data object is greater than Local density threshold, and centered on the data object, preset truncation distance in the region of radius, there is no part is close Degree is greater than the data object of the local density of the data object, then cluster is created using the data object as cluster centre;
For each cluster of creation, the core data object that the cluster includes, and the core data object for including according to the cluster are determined Update the cluster;
The data object that will not belong to either cluster, as abnormal data object.
2. determining data the method according to claim 1, wherein described use preset density clustering algorithm The corresponding local density of each data object in object set, comprising:
For each data object in the set of data objects, determine between the data object and other data objects away from From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data object Corresponding local density.
3. the method according to claim 1, wherein it is described using the data object as cluster centre create cluster, Include:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object Cluster.
4. method according to claim 1-3, which is characterized in that each cluster for creation, determining should The core data object that cluster includes, and the core data object for including according to the cluster updates the cluster, comprising:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core local density The data object of threshold value is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation distance for each core data object determined It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
The product for calculating the local density threshold Yu preset contraction factor obtains the core local density threshold, described The numerical value of contraction factor is less than 1.
6. a kind of detection device of abnormal data, which is characterized in that described device comprises determining that module, creation module and update Module;
The determining module determines each data object difference in set of data objects for using preset density clustering algorithm Corresponding local density, the set of data objects includes multiple data objects, when a data object is by same history samples Between put a collected target device multiple operating index constitute;
The creation module, each data object for being directed in the set of data objects, if the data object is corresponding Local density be greater than preset local density threshold, and centered on the data object, preset truncation distance be radius In region, there is no local density be greater than the data object local density data object, then using the data object as gather Class center creates cluster;
The update module determines the core data object that the cluster includes, and according to the cluster packet for each cluster for creation The core data object contained updates the cluster;
The determining module is also used to will not belong to the data object of either cluster, as abnormal data object.
7. device according to claim 6, which is characterized in that the determining module is specifically used for:
For each data object in the set of data objects, determine between the data object and other data objects away from From;
The number of the data object of preset truncation distance will be less than with the distance between the data object, as the data object Corresponding local density.
8. device according to claim 6, which is characterized in that the creation module is specifically used for:
It will be that the data object in the range of radius is divided to this by the center of circle, the preset distance that is truncated of the data object Cluster.
9. according to the described in any item devices of claim 6-8, which is characterized in that the update module is specifically used for:
For each cluster of creation, in the data object that the cluster includes, local density is greater than preset core local density The data object of threshold value is determined as core data object;
It, will be using the core data object as the center of circle, the preset truncation distance for each core data object determined It is divided to the cluster for the data object in the range of radius, and continues to determine core number in the data object for being newly divided to the cluster According to object to continue to update the cluster, until data object that the cluster includes remains unchanged.
10. device according to claim 9, which is characterized in that described device further include: computing module;
The computing module obtains the core for calculating the product of the local density threshold Yu preset contraction factor Local density threshold, the numerical value of the contraction factor is less than 1.
CN201910327595.4A 2019-04-23 2019-04-23 Abnormal data detection method and device Active CN110083475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910327595.4A CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910327595.4A CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Publications (2)

Publication Number Publication Date
CN110083475A true CN110083475A (en) 2019-08-02
CN110083475B CN110083475B (en) 2022-10-25

Family

ID=67416157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910327595.4A Active CN110083475B (en) 2019-04-23 2019-04-23 Abnormal data detection method and device

Country Status (1)

Country Link
CN (1) CN110083475B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111125362A (en) * 2019-12-23 2020-05-08 百度国际科技(深圳)有限公司 Abnormal text determination method and device, electronic equipment and medium
CN112468329A (en) * 2020-11-13 2021-03-09 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for batch grouping management of servers
WO2021109314A1 (en) * 2019-12-06 2021-06-10 网宿科技股份有限公司 Method, system and device for detecting abnormal data
CN113343056A (en) * 2021-05-21 2021-09-03 北京市燃气集团有限责任公司 Method and device for detecting abnormal gas consumption of user
CN113542060A (en) * 2021-07-07 2021-10-22 电子科技大学中山学院 Abnormal equipment detection method based on equipment communication data characteristics
CN116882850A (en) * 2023-09-08 2023-10-13 山东科技大学 Garden data intelligent management method and system based on big data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006039970A (en) * 2004-07-27 2006-02-09 Kokuritsu Iyakuhin Shokuhin Eisei Kenkyusho Device for splitting high dimensional data into blocks
CN101536591A (en) * 2006-10-30 2009-09-16 Lm爱立信电话有限公司 Extended clustering for improved positioning
CN103336781A (en) * 2013-05-29 2013-10-02 江苏大学 Medical image clustering method
CN104484600A (en) * 2014-11-18 2015-04-01 中国科学院深圳先进技术研究院 Intrusion detection method and device based on improved density clustering
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering
CN107563400A (en) * 2016-06-30 2018-01-09 中国矿业大学 A kind of density peaks clustering method and system based on grid
CN108537276A (en) * 2018-04-09 2018-09-14 广东工业大学 A kind of choosing method of cluster centre, device and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006039970A (en) * 2004-07-27 2006-02-09 Kokuritsu Iyakuhin Shokuhin Eisei Kenkyusho Device for splitting high dimensional data into blocks
CN101536591A (en) * 2006-10-30 2009-09-16 Lm爱立信电话有限公司 Extended clustering for improved positioning
CN103336781A (en) * 2013-05-29 2013-10-02 江苏大学 Medical image clustering method
CN104484600A (en) * 2014-11-18 2015-04-01 中国科学院深圳先进技术研究院 Intrusion detection method and device based on improved density clustering
CN105577679A (en) * 2016-01-14 2016-05-11 华东师范大学 Method for detecting anomaly traffic based on feature selection and density peak clustering
CN107563400A (en) * 2016-06-30 2018-01-09 中国矿业大学 A kind of density peaks clustering method and system based on grid
CN108537276A (en) * 2018-04-09 2018-09-14 广东工业大学 A kind of choosing method of cluster centre, device and medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021109314A1 (en) * 2019-12-06 2021-06-10 网宿科技股份有限公司 Method, system and device for detecting abnormal data
CN111125362A (en) * 2019-12-23 2020-05-08 百度国际科技(深圳)有限公司 Abnormal text determination method and device, electronic equipment and medium
CN111125362B (en) * 2019-12-23 2023-06-16 百度国际科技(深圳)有限公司 Abnormal text determination method and device, electronic equipment and medium
CN112468329A (en) * 2020-11-13 2021-03-09 苏州浪潮智能科技有限公司 Method, device, equipment and readable medium for batch grouping management of servers
CN113343056A (en) * 2021-05-21 2021-09-03 北京市燃气集团有限责任公司 Method and device for detecting abnormal gas consumption of user
CN113542060A (en) * 2021-07-07 2021-10-22 电子科技大学中山学院 Abnormal equipment detection method based on equipment communication data characteristics
CN113542060B (en) * 2021-07-07 2023-03-07 电子科技大学中山学院 Abnormal equipment detection method based on equipment communication data characteristics
CN116882850A (en) * 2023-09-08 2023-10-13 山东科技大学 Garden data intelligent management method and system based on big data
CN116882850B (en) * 2023-09-08 2023-12-12 山东科技大学 Garden data intelligent management method and system based on big data

Also Published As

Publication number Publication date
CN110083475B (en) 2022-10-25

Similar Documents

Publication Publication Date Title
CN110083475A (en) A kind of detection method and device of abnormal data
CN109587001A (en) A kind of performance indicator method for detecting abnormality and device
US8930223B2 (en) Patient cohort matching
CN110113226A (en) A kind of method and device of detection device exception
CN109558295A (en) A kind of performance indicator method for detecting abnormality and device
JP6246357B2 (en) Building management apparatus, wide area management system, data acquisition method, and program
CN114217948A (en) Performance monitoring in distributed storage systems
CN110198313A (en) A kind of method and device of strategy generating
CN107276851B (en) Node abnormity detection method and device, network node and console
CN108345601A (en) Search result ordering method and device
CN107992738A (en) A kind of account logs in method for detecting abnormality, device and electronic equipment
CN110489757A (en) A kind of keyword extracting method and device
CN110516752A (en) Clustering cluster method for evaluating quality, device, equipment and storage medium
CN108021713B (en) Document clustering method and device
CN110427259A (en) A kind of task processing method and device
CN111540202B (en) Similar bayonet determining method and device, electronic equipment and readable storage medium
CN115932144B (en) Chromatograph performance detection method, chromatograph performance detection device, chromatograph performance detection equipment and computer medium
CN109522275A (en) Label method for digging, electronic equipment and the storage medium of content are produced based on user
CN117113247A (en) Drainage system abnormality monitoring method, equipment and storage medium based on two-classification and clustering algorithm
CN108959415A (en) A kind of exception dimension localization method, device and electronic equipment
CN109408369A (en) A kind of system detection method, device and electronic equipment
WO2018125419A1 (en) Automatic prediction of patient length of stay and detection of medical center readmission diagnoses
WO2021184588A1 (en) Cluster optimization method and device, server, and medium
CN110309257A (en) A kind of file read-write deployment method and device
CN109828970B (en) Information processing method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant