CN110298407A - A kind of method of anomaly data detection, system and equipment - Google Patents

A kind of method of anomaly data detection, system and equipment Download PDF

Info

Publication number
CN110298407A
CN110298407A CN201910595139.8A CN201910595139A CN110298407A CN 110298407 A CN110298407 A CN 110298407A CN 201910595139 A CN201910595139 A CN 201910595139A CN 110298407 A CN110298407 A CN 110298407A
Authority
CN
China
Prior art keywords
krill
individual
group
data
position vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910595139.8A
Other languages
Chinese (zh)
Other versions
CN110298407B (en
Inventor
蔡延光
阮嘉琨
蔡颢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910595139.8A priority Critical patent/CN110298407B/en
Publication of CN110298407A publication Critical patent/CN110298407A/en
Application granted granted Critical
Publication of CN110298407B publication Critical patent/CN110298407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Traffic Control Systems (AREA)

Abstract

This application discloses a kind of methods of anomaly data detection, comprising: utilizes the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;Obtain pending data;Pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains clustering cluster data set and noise data collection;Determine that the data that noise data is concentrated are abnormal data.Technical solution provided herein, greatly enhance the Clustering Effect to traffic data, when facing freeway traffic flow data complicated and changeable and strong stochastic volatility, it can accurately detect abnormal traffic data, improve the efficiency and accuracy of abnormal traffic Data Detection.The application additionally provides the system, equipment and computer readable storage medium of a kind of anomaly data detection simultaneously, has above-mentioned beneficial effect.

Description

A kind of method of anomaly data detection, system and equipment
Technical field
This application involves anomaly data detection field, in particular to a kind of method of anomaly data detection, system, equipment and Computer readable storage medium.
Background technique
In highway intelligent transportation system, abnormal data is an important factor for influencing the quality of data, therefore to exception The lookup and cleaning of traffic data are the important steps of freeway traffic quality of data optimization.Abnormal traffic data are big in traffic The part of minority in data, numerical values recited, development trend data and curves form all with the normal traffic that accounts for large scale Data have a degree of gap.
Since the property complicated and changeable and stochastic volatility of freeway traffic flow data are strong, and lead to traditional abnormal number It is difficult accurately to detect its traffic flow abnormal data according to recognition methods.
Therefore, how accurately to detect that traffic flow abnormal data is that the technology that those skilled in the art need to solve at present is asked Topic.
Summary of the invention
The purpose of the application is to provide method, system, equipment and the computer-readable storage medium of a kind of anomaly data detection Matter, for accurately detecting its traffic flow abnormal data.
In order to solve the above technical problems, the application provides a kind of method of anomaly data detection, this method comprises:
Utilize the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;
Obtain pending data;
The pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains cluster cluster data Collection and noise data collection;
Determine that the data that the noise data is concentrated are abnormal data.
Optionally, the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm, comprising:
Using the field parameter of the DBSCAN clustering algorithm as the individual position vector of krill group;
The fitness value of the krill group individual is calculated according to the individual position vector of krill group;
Worst krill individual and optimal krill individual are determined according to the fitness value of the krill group individual;
The movement for executing krill group's induction by removing the worst krill individual and the optimal krill individual, work of looking for food The movement of dynamic and STOCHASTIC DIFFUSION;
Each individual position vector of krill group is updated;
Judge whether the maximum number of iterations for reaching the krill group algorithm;
If it is not, then returning to the fitness value for executing and calculating the krill group individual according to the individual position vector of krill group The step of;
It is calculated if so, updated each individual position vector of krill group is clustered as the DBSCAN after the optimization The field parameter of method.
Optionally, the fitness value of the krill group individual is calculated according to the individual position vector of krill group, comprising:
According to formula
Calculate the krill group individual Fitness value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, Euclidean distance of the dist (x, x') between x and x', ε is constant.
Optionally, each individual position vector of krill group is updated, comprising:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
Optionally, according to formula
Before updating each individual position vector of krill group, further includes:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo each individual position vector member of krill group Element carries out mutation operator;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability, xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
The application also provides a kind of system of anomaly data detection, which includes:
Optimization module, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module is obtained, for obtaining pending data;
Mark module, for the pending data to be marked based on the DBSCAN clustering algorithm after optimization, Obtain clustering cluster data set and noise data collection;
Determining module, for determining that the data that the noise data is concentrated are abnormal data.
Optionally, the optimization module includes:
Position vector determines submodule, for using the field parameter of the DBSCAN clustering algorithm as krill group position Set vector;
Computational submodule, for calculating the fitness of the krill group individual according to the individual position vector of krill group Value;
First determines submodule, for determining that worst krill is individual and optimal according to the fitness value of the krill group individual Krill individual;
Implementation sub-module is lured for executing by the krill group for removing the worst krill individual and the optimal krill individual Movement, foraging activity and the STOCHASTIC DIFFUSION movement led;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach the krill group algorithm;
Submodule is returned to, for returning to calculating when the maximum number of iterations of the not up to described krill group algorithm Module executes the step of calculating the fitness value of the krill group individual according to the individual position vector of krill group;
Second determines submodule, for when reaching the maximum number of iterations of krill group's algorithm, will it is updated respectively Field parameter of the individual position vector of krill group as the DBSCAN clustering algorithm after the optimization.
Optionally, the computational submodule includes:
Computing unit, for according to formula
Calculate the krill group individual Fitness value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, Euclidean distance of the dist (x, x') between x and x', ε is constant.
The application also provides a kind of anomaly data detection equipment, which includes:
Memory, for storing computer program;
Processor realizes the method for the anomaly data detection as described in any of the above-described when for executing the computer program The step of.
The application also provides a kind of computer readable storage medium, and calculating is stored on the computer readable storage medium Machine program realizes the step of the method for anomaly data detection as described in any of the above-described when the computer program is executed by processor Suddenly.
The method of anomaly data detection provided herein, comprising: utilize krill group's algorithm optimization DBSCAN clustering algorithm Field parameter;Obtain pending data;Pending data is marked based on the DBSCAN clustering algorithm after optimization, is obtained Clustering cluster data set and noise data collection;Determine that the data that noise data is concentrated are abnormal data.
Technical solution provided herein, by being carried out based on the DBSCAN clustering algorithm after optimization to pending data Label, obtains clustering cluster data set and noise data collection, finally determines that the data that noise data is concentrated are abnormal data, greatly The Clustering Effect to traffic data is enhanced, freeway traffic flow data complicated and changeable and strong stochastic volatility are being faced When, it can accurately detect abnormal traffic data, improve the efficiency and accuracy of abnormal traffic Data Detection.The application is same When additionally provide the system, equipment and computer readable storage medium of a kind of anomaly data detection, have above-mentioned beneficial effect, This is repeated no more.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of the method for anomaly data detection provided by the embodiment of the present application;
Fig. 2 is a kind of process of practical manifestation mode of S101 in a kind of method of anomaly data detection provided by Fig. 1 Figure;
Fig. 3 is a kind of structure chart of the system of anomaly data detection provided by the embodiment of the present application;
Fig. 4 is the structure chart of the system of another kind anomaly data detection provided by the embodiment of the present application;
Fig. 5 is a kind of structure chart of anomaly data detection equipment provided by the embodiment of the present application.
Specific embodiment
The core of the application is to provide method, system, equipment and the computer-readable storage medium of a kind of anomaly data detection Matter, for accurately detecting its traffic flow abnormal data.
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Referring to FIG. 1, Fig. 1 is a kind of flow chart of the method for anomaly data detection provided by the embodiment of the present application.
It specifically comprises the following steps:
S101: the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm is utilized;
Property complicated and changeable and stochastic volatility based on freeway traffic flow data are strong, and lead to traditional abnormal number It is difficult accurately to detect its traffic flow abnormal data according to recognition methods, this application provides a kind of method of anomaly data detection, For solving the above problems;
The application is marked pending data by DBSCAN clustering algorithm, and DBSCAN is on the basis of packing density The set of data samples of input is drawn classification according to the tightness degree between each data by a kind of clustering algorithm for dividing class, DBSCAN density clustering algorithm effectively can accurately classify to expressway traffic data and isolate exceptional sample, from And detect its abnormal traffic data.
However, DBSCAN clustering algorithm is very sensitive to the setting of field parameter (E, MinPts), if specified not When the algorithm will cause the decline of clustering result quality, therefore the application optimizes the neck of DBSCAN clustering algorithm by krill group's algorithm Field parameter, krill group's algorithm is a kind of novel swarm intelligence algorithm for imitating krill group survival activity, can be efficiently solved Extensive benchmark optimization problem, can optimize the field parameter of DBSCAN clustering algorithm, improve the cluster of DBSCAN clustering algorithm Quality, and then improve the efficiency and accuracy of abnormal traffic Data Detection.
S102: pending data is obtained;
Pending data mentioned herein be input expressway traffic sample set, the expressway traffic sample set it is defeated Enter mode and be specifically as follows user to be manually entered, or be connected to predetermined server and be downloaded, the application is to the input Mode is not especially limited.
S103: pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains clustering cluster data set And noise data collection;
Optionally, pending data is marked in the DBSCAN clustering algorithm mentioned herein based on after optimization, obtains Clustering cluster data set and noise data collection, are specifically as follows:
S1031: the expressway traffic sample set D=(x of input is received1,x2,...,xm), and determining Neighbourhood parameter (E, MinPts);
S1032: setting kernel object set A is divided into the quantity k of clustering cluster, the data acquisition system B that is not traversed also and Cluster divides C, and is initialized, i.e.,
S1033: it finds out kernel object all in expressway traffic sample set D and passes through distance first if j=1,2...m Metric form searches out the E- neighborhood subsample collection NE (x of traffic flow sample xjj), if meeting | NE (xj) | >=MinPts, then xjPut kernel object set A, i.e. A=A ∪ { x intoj};
S1034: judge whether kernel object set A is empty;
IfThen algorithm terminates, and executes step S1038 at this time, ifThen follow the steps S1035;
S1035: appoint and take a ∈ A, while kernel object queue Acur={ a }, k=k+1 are set, this moment clustering cluster data set Ck={ a } updates B=B- { a };
S1036: judge whether kernel object queue is empty;
IfThen update C={ C1,C2,...,CkAnd A=A-Ck, and return to step S1034;IfThen follow the steps S1037;
S1037: appointing and take a' ∈ Acur, updates N according to E- neighborhood definitionE(a'), if Δ=NE(a') ∩ B updates Ck with this =Ck ∪Δ, then B=B- Δ and Acur=Acur ∪ (Δ ∩ A)-a' return to step S1036;
S1038: a clustering cluster C={ C is exported1,C2,...,CkAnd abnormal data cluster D-C.
The embodiment of the present application accurately classifies point to expressway traffic data by DBSCAN density clustering algorithm Separate out exceptional sample, have the stronger speed of service, accuracy rate and recall ratio, shown when detecting abnormal traffic data compared with Good stability and validity.
S104: determine that the data that noise data is concentrated are abnormal data.
Based on the above-mentioned technical proposal, the method for a kind of anomaly data detection provided herein, after based on optimization DBSCAN clustering algorithm pending data is marked, obtain clustering cluster data set and noise data collection, finally determination make an uproar Data in sound data set are abnormal data, greatly enhance the Clustering Effect to traffic data, in face of it is complicated and changeable with And when the strong freeway traffic flow data of stochastic volatility, it can accurately detect abnormal traffic data, improve exception The efficiency and accuracy of traffic data detection.
It is directed to the step S101 of an embodiment, wherein described calculated using krill group's algorithm optimization DBSCAN cluster The field parameter of method, step that specifically can also be as shown in Figure 2 are illustrated below with reference to Fig. 2.
Referring to FIG. 2, a kind of practical manifestation of the Fig. 2 for S103 in a kind of method of anomaly data detection provided by Fig. 1 The flow chart of mode.
Itself specifically includes the following steps:
S201: using the field parameter of DBSCAN clustering algorithm as the individual position vector of krill group;
For example, the position x of krill individual ii=(xi1,xi2), wherein xi1Represent E, and xi1∈[REdown,REup], xi2It represents MinPts, and xi2∈[RMdown,RMup],
S202: the fitness value of krill group's individual is calculated according to the individual position vector of krill group;
Optionally, the fitness value mentioned herein that krill group's individual is calculated according to the individual position vector of krill group, tool Body can be with are as follows:
According to formula
Calculate the suitable of krill group's individual Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in krill group, dist The Euclidean distance of (x, x') between x and x', ε are constant.
S203: worst krill individual and optimal krill individual are determined according to the fitness value of krill group's individual;
It optionally, can be by the maximum krill group individual of fitness value as optimal krill individual, by fitness value minimum Krill group individual as worst krill individual.
S204: movement, the foraging activity of krill group's induction by removing worst krill individual and optimal krill individual are executed And STOCHASTIC DIFFUSION movement;
Optionally, the fortune mentioned herein for executing krill group's induction by removing worst krill individual and optimal krill individual Dynamic, foraging activity and STOCHASTIC DIFFUSION movement, can specifically include following steps:
S2041: according to formula
It executes by removing worst krill The movement of the krill of body and optimal krill individual group's induction;
Wherein, NmaxFor maximum induced velocity, wnFor the mobile inertia weight in position caused in [0,1] range, Ni old For it is preceding it is primary alternate movement induction moving direction position,For the best food direction induction that provides of individual in krill group, XiFor the fitness or target function value of i-th of individual current iteration in krill group, XjFor j-th of (j=1,2 ..., N) neighbour Fitness, ε be infinitesimal.
S2042: according to formula Fi=Vfβi+wfFi oldExecute foraging activity;
Wherein, VfFor speed of looking for food, wfInertia weight for movement of looking for food and the value in [0,1] range, Fi oldIt is last time Last movement of looking for food, βiFor direction of looking for food, and βiAccording to formula βii foodi bestIt is calculated, whereinRefer to food The attraction of object, according to formulaWithIt is calculated, andThen It is i-th of individual best situation of fitness so far in krill group, according to formulaIt is calculated.
S2042: according to formulaCarry out STOCHASTIC DIFFUSION;
Wherein, DmaxTo maximally diffuse speed, δ is random direction vector, I and ImaxBe respectively current krill group more repeatly The several and pre-set global maximum change number constant of krill group.
S205: each individual position vector of krill group is updated;
Optionally, mentioned herein that each individual position vector of krill group is updated, it is specifically as follows:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length, can be according to formulaIt is calculated, wherein CtIt is limited Step factor processed is the constant between [0,2], the C of low valuetIt can allow the careful search space of krill individual.
Optionally, according to formula
Before updating each individual position vector of krill group, it can also include the following steps:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo the individual position vector element of each krill group into Row variation operation;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability, xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
S206: judge whether the maximum number of iterations for reaching krill group's algorithm;
If so, entering step S207;If it is not, then returning to step S202;
S207: join using updated each individual position vector of krill group as the field of the DBSCAN clustering algorithm after optimization Number.
Referring to FIG. 3, Fig. 3 is a kind of structure chart of the system of anomaly data detection provided by the embodiment of the present application.
The system may include:
Optimization module 100, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module 200 is obtained, for obtaining pending data;
Mark module 300 is gathered for pending data to be marked based on the DBSCAN clustering algorithm after optimization Class cluster data collection and noise data collection;
Determining module 400, for determining that the data that noise data is concentrated are abnormal data.
Referring to FIG. 4, Fig. 4 is the structure chart of the system of another kind anomaly data detection provided by the embodiment of the present application.
The optimization module 100 may include:
Position vector determines submodule, for using the field parameter of DBSCAN clustering algorithm as krill group body position to Amount;
Computational submodule, for calculating the fitness value of krill group's individual according to the individual position vector of krill group;
First determines submodule, for determining worst krill individual and optimal krill according to the fitness value of krill group's individual Individual;
Implementation sub-module, for executing the fortune of krill group's induction by removing worst krill individual and optimal krill individual Dynamic, foraging activity and STOCHASTIC DIFFUSION movement;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach krill group's algorithm;
Submodule is returned, for computational submodule being returned and being executed when the not up to maximum number of iterations of krill group algorithm The step of calculating the fitness value of krill group's individual according to the individual position vector of krill group;
Second determine submodule, for when reach krill group algorithm maximum number of iterations when, by updated each krill Field parameter of the individual position vector of group as the DBSCAN clustering algorithm after optimization.
Further, which may include:
Computing unit, for according to formula
Calculate the suitable of krill group's individual Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in krill group, dist The Euclidean distance of (x, x') between x and x', ε are constant.
The update submodule may include:
Updating unit, for according to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
The update submodule can also include:
Crossing operation unit, for according to formulaTo each individual position vector element of krill group Carry out crossing operation;
Mutation operator unit, for according to formulaTo each krill group Body position vector element carries out mutation operator;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability, xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
Since the embodiment of components of system as directed is corresponded to each other with the embodiment of method part, the embodiment of components of system as directed is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
Referring to FIG. 5, Fig. 5 is a kind of structure chart of anomaly data detection equipment provided by the embodiment of the present application.
The anomaly data detection equipment 500 can generate bigger difference because configuration or performance are different, may include one A or more than one processor (central processing units, CPU) 522 is (for example, one or more are handled Device) and memory 532, one or more storage application programs 542 or data 544 storage medium 530 (such as one or More than one mass memory unit).Wherein, memory 532 and storage medium 530 can be of short duration storage or persistent storage.It deposits Storage may include one or more modules (diagram does not mark) in the program of storage medium 530, and each module may include To the series of instructions operation in device.Further, central processing unit 522 can be set to communicate with storage medium 530, The series of instructions operation in storage medium 530 is executed in anomaly data detection equipment 500.
Anomaly data detection equipment 500 can also include one or more power supplys 525, one or more are wired Or radio network interface 550, one or more input/output interfaces 558, and/or, one or more operating systems 541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in the method for anomaly data detection described in above-mentioned Fig. 1 to Fig. 2 is based on by anomaly data detection equipment The structure shown in fig. 5 is realized.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of module, Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple module or components can be with In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or module or Communication connection can be electrical property, mechanical or other forms.
Module may or may not be physically separated as illustrated by the separation member, show as module Component may or may not be physical module, it can and it is in one place, or may be distributed over multiple networks In module.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.
If integrated module is realized and when sold or used as an independent product in the form of software function module, can To be stored in a computer readable storage medium.Based on this understanding, the technical solution of the application substantially or Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products Out, which is stored in a storage medium, including some instructions are used so that a computer equipment The whole of (can be personal computer, funcall device or the network equipment etc.) execution each embodiment method of the application Or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. are various can store program The medium of code.
Above to a kind of method of anomaly data detection provided herein, system, equipment and computer-readable storage Medium is described in detail.Specific examples are used herein to illustrate the principle and implementation manner of the present application, with The explanation of upper embodiment is merely used to help understand the present processes and its core concept.It should be pointed out that being led for this technology For the those of ordinary skill in domain, under the premise of not departing from the application principle, can also to the application carry out it is several improvement and Modification, these improvement and modification are also fallen into the protection scope of the claim of this application.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or equipment for including element.

Claims (10)

1. a kind of method of anomaly data detection characterized by comprising
Utilize the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;
Obtain pending data;
The pending data is marked based on the DBSCAN clustering algorithm after optimization, obtain clustering cluster data set and Noise data collection;
Determine that the data that the noise data is concentrated are abnormal data.
2. the method according to claim 1, wherein described utilize krill group's algorithm optimization DBSCAN clustering algorithm Field parameter, comprising:
Using the field parameter of the DBSCAN clustering algorithm as the individual position vector of krill group;
The fitness value of the krill group individual is calculated according to the individual position vector of krill group;
Worst krill individual and optimal krill individual are determined according to the fitness value of the krill group individual;
Execute the movement induced by the krill group for removing the worst krill individual and the optimal krill individual, foraging activity with And STOCHASTIC DIFFUSION movement;
Each individual position vector of krill group is updated;
Judge whether the maximum number of iterations for reaching the krill group algorithm;
If it is not, then returning to the step for executing the fitness value for calculating the krill group individual according to the individual position vector of krill group Suddenly;
If so, using updated each individual position vector of krill group as the DBSCAN clustering algorithm after the optimization Field parameter.
3. according to the method described in claim 2, it is characterized in that, calculating the phosphorus according to the individual position vector of krill group The fitness value of shoal of shrimps individual, comprising:
According to formula
Calculate the suitable of the krill group individual Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by being neck with krill individual current location vector K clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, dist The Euclidean distance of (x, x') between x and x', ε are constant.
4. according to the method described in claim 2, it is characterized in that, be updated to each individual position vector of krill group, Include:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
5. according to the method described in claim 4, it is characterized in that, according to formulaUpdate each institute Before stating the individual position vector of krill group, further includes:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo each individual position vector element of krill group into Row variation operation;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mIt is i-th M group position vector element carries out the probability of crossing operation or mutation operator in a krill individual, and Mu is mutation probability, xgbes,m The optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
6. a kind of system of anomaly data detection characterized by comprising
Optimization module, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module is obtained, for obtaining pending data;
Mark module is obtained for the pending data to be marked based on the DBSCAN clustering algorithm after optimization Clustering cluster data set and noise data collection;
Determining module, for determining that the data that the noise data is concentrated are abnormal data.
7. system according to claim 6, which is characterized in that the optimization module includes:
Position vector determines submodule, for using the field parameter of the DBSCAN clustering algorithm as krill group body position to Amount;
Computational submodule, for calculating the fitness value of the krill group individual according to the individual position vector of krill group;
First determines submodule, for determining worst krill individual and optimal krill according to the fitness value of the krill group individual Individual;
Implementation sub-module, for executing by krill group's induction of the removing worst krill individual and the optimal krill individual Movement, foraging activity and STOCHASTIC DIFFUSION movement;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach the krill group algorithm;
Submodule is returned to, for returning to the computational submodule when the maximum number of iterations of the not up to described krill group algorithm Execute the step of fitness value of the krill group individual is calculated according to the krill group individual position vector;
Second determines submodule, for when reaching the maximum number of iterations of krill group's algorithm, will it is updated it is each described in Field parameter of the individual position vector of krill group as the DBSCAN clustering algorithm after the optimization.
8. system according to claim 7, which is characterized in that the computational submodule includes:
Computing unit, for according to formula
Calculate the suitable of the krill group individual Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by being neck with krill individual current location vector K clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, dist The Euclidean distance of (x, x') between x and x', ε are constant.
9. a kind of anomaly data detection equipment characterized by comprising
Memory, for storing computer program;
Processor, realizing the anomaly data detection as described in any one of claim 1 to 5 when for executing the computer program The step of method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the side of the anomaly data detection as described in any one of claim 1 to 5 when the computer program is executed by processor The step of method.
CN201910595139.8A 2019-07-03 2019-07-03 Abnormal data detection method, system and equipment Active CN110298407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910595139.8A CN110298407B (en) 2019-07-03 2019-07-03 Abnormal data detection method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910595139.8A CN110298407B (en) 2019-07-03 2019-07-03 Abnormal data detection method, system and equipment

Publications (2)

Publication Number Publication Date
CN110298407A true CN110298407A (en) 2019-10-01
CN110298407B CN110298407B (en) 2023-05-09

Family

ID=68030056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910595139.8A Active CN110298407B (en) 2019-07-03 2019-07-03 Abnormal data detection method, system and equipment

Country Status (1)

Country Link
CN (1) CN110298407B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955737A (en) * 2023-09-19 2023-10-27 源康(东阿)健康科技有限公司 Abnormal characteristic retrieval method used in gelatin production process

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368858A (en) * 2017-07-28 2017-11-21 中南大学 A kind of parametrization measurement multi-model intelligent method for fusing of intelligent environment carrying robot identification floor
CN109669990A (en) * 2018-11-16 2019-04-23 重庆邮电大学 A kind of innovatory algorithm carrying out Outliers mining to density irregular data based on DBSCAN
CN109766393A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Abnormal deviation data examination method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368858A (en) * 2017-07-28 2017-11-21 中南大学 A kind of parametrization measurement multi-model intelligent method for fusing of intelligent environment carrying robot identification floor
CN109669990A (en) * 2018-11-16 2019-04-23 重庆邮电大学 A kind of innovatory algorithm carrying out Outliers mining to density irregular data based on DBSCAN
CN109766393A (en) * 2018-12-06 2019-05-17 中科恒运股份有限公司 Abnormal deviation data examination method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JIAN ZHANG ET AL.: "Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Detection", 《 INTELLIGENT INFORMATION PROCESSING》 *
SINGH V ET AL.: "Krill Herd Clustering Algorithm using DBSCAN Technique", 《 INTERNATIONAL JOURNAL OF COMPUTER SCIENCE & ENGINEERING TECHNOLOGY》 *
刘沛: "磷虾群优化算法的改进及应用", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
蒋华 等: "基于DCNDA算法的数据异常检测", 《计算机工程与设计》 *
阮嘉琨 等: "基于DBSCAN密度聚类算法的高速公路交通流异常数据检测", 《工业控制计算机》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955737A (en) * 2023-09-19 2023-10-27 源康(东阿)健康科技有限公司 Abnormal characteristic retrieval method used in gelatin production process
CN116955737B (en) * 2023-09-19 2023-11-28 源康(东阿)健康科技有限公司 Abnormal characteristic retrieval method used in gelatin production process

Also Published As

Publication number Publication date
CN110298407B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
Li et al. Evolutionary clustering of moving objects
US7724784B2 (en) System and method for classifying data streams using high-order models
CN102289522B (en) Method of intelligently classifying texts
US20120041979A1 (en) Method for generating context hierarchy and system for generating context hierarchy
CN111008337B (en) Deep attention rumor identification method and device based on ternary characteristics
CN109189747B (en) Spark big data platform-based user behavior habit analysis method for storage battery car
CN113344128B (en) Industrial Internet of things self-adaptive stream clustering method and device based on micro clusters
CN107579846B (en) Cloud computing fault data detection method and system
CN109657147A (en) Microblogging abnormal user detection method based on firefly and weighting extreme learning machine
CN111027743A (en) OD optimal path searching method and device based on hierarchical road network
CN110390816A (en) A kind of condition discrimination method based on multi-model fusion
CN109492596A (en) A kind of pedestrian detection method and system based on K-means cluster and region recommendation network
CN109740052A (en) The construction method and device of network behavior prediction model, network behavior prediction technique
CN110298407A (en) A kind of method of anomaly data detection, system and equipment
CN109376790A (en) A kind of binary classification method based on Analysis of The Seepage
CN111553566A (en) Method for defining service range of urban public service facility
Gias et al. Samplehst: Efficient on-the-fly selection of distributed traces
Chandio et al. Towards adaptable and tunable cloud-based map-matching strategy for GPS trajectories
CN108280210B (en) Traffic route determination method and system based on firework algorithm
CN105354585A (en) Improved cat swarm algorithm based target extraction and classification method
CN109981630A (en) Intrusion detection method and system based on Chi-square Test and LDOF algorithm
CN109739840A (en) Data processing empty value method, apparatus and terminal device
CN109583574A (en) A kind of high-precision Network Intrusion Detection System
CN115292303A (en) Data processing method and device
CN114168733A (en) Method and system for searching rules based on complex network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant