CN110298407A - A kind of method of anomaly data detection, system and equipment - Google Patents
A kind of method of anomaly data detection, system and equipment Download PDFInfo
- Publication number
- CN110298407A CN110298407A CN201910595139.8A CN201910595139A CN110298407A CN 110298407 A CN110298407 A CN 110298407A CN 201910595139 A CN201910595139 A CN 201910595139A CN 110298407 A CN110298407 A CN 110298407A
- Authority
- CN
- China
- Prior art keywords
- krill
- individual
- group
- data
- position vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Traffic Control Systems (AREA)
Abstract
This application discloses a kind of methods of anomaly data detection, comprising: utilizes the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;Obtain pending data;Pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains clustering cluster data set and noise data collection;Determine that the data that noise data is concentrated are abnormal data.Technical solution provided herein, greatly enhance the Clustering Effect to traffic data, when facing freeway traffic flow data complicated and changeable and strong stochastic volatility, it can accurately detect abnormal traffic data, improve the efficiency and accuracy of abnormal traffic Data Detection.The application additionally provides the system, equipment and computer readable storage medium of a kind of anomaly data detection simultaneously, has above-mentioned beneficial effect.
Description
Technical field
This application involves anomaly data detection field, in particular to a kind of method of anomaly data detection, system, equipment and
Computer readable storage medium.
Background technique
In highway intelligent transportation system, abnormal data is an important factor for influencing the quality of data, therefore to exception
The lookup and cleaning of traffic data are the important steps of freeway traffic quality of data optimization.Abnormal traffic data are big in traffic
The part of minority in data, numerical values recited, development trend data and curves form all with the normal traffic that accounts for large scale
Data have a degree of gap.
Since the property complicated and changeable and stochastic volatility of freeway traffic flow data are strong, and lead to traditional abnormal number
It is difficult accurately to detect its traffic flow abnormal data according to recognition methods.
Therefore, how accurately to detect that traffic flow abnormal data is that the technology that those skilled in the art need to solve at present is asked
Topic.
Summary of the invention
The purpose of the application is to provide method, system, equipment and the computer-readable storage medium of a kind of anomaly data detection
Matter, for accurately detecting its traffic flow abnormal data.
In order to solve the above technical problems, the application provides a kind of method of anomaly data detection, this method comprises:
Utilize the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;
Obtain pending data;
The pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains cluster cluster data
Collection and noise data collection;
Determine that the data that the noise data is concentrated are abnormal data.
Optionally, the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm, comprising:
Using the field parameter of the DBSCAN clustering algorithm as the individual position vector of krill group;
The fitness value of the krill group individual is calculated according to the individual position vector of krill group;
Worst krill individual and optimal krill individual are determined according to the fitness value of the krill group individual;
The movement for executing krill group's induction by removing the worst krill individual and the optimal krill individual, work of looking for food
The movement of dynamic and STOCHASTIC DIFFUSION;
Each individual position vector of krill group is updated;
Judge whether the maximum number of iterations for reaching the krill group algorithm;
If it is not, then returning to the fitness value for executing and calculating the krill group individual according to the individual position vector of krill group
The step of;
It is calculated if so, updated each individual position vector of krill group is clustered as the DBSCAN after the optimization
The field parameter of method.
Optionally, the fitness value of the krill group individual is calculated according to the individual position vector of krill group, comprising:
According to formula
Calculate the krill group individual
Fitness value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector
For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group,
Euclidean distance of the dist (x, x') between x and x', ε is constant.
Optionally, each individual position vector of krill group is updated, comprising:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
Optionally, according to formula
Before updating each individual position vector of krill group, further includes:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo each individual position vector member of krill group
Element carries out mutation operator;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor
M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability,
xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
The application also provides a kind of system of anomaly data detection, which includes:
Optimization module, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module is obtained, for obtaining pending data;
Mark module, for the pending data to be marked based on the DBSCAN clustering algorithm after optimization,
Obtain clustering cluster data set and noise data collection;
Determining module, for determining that the data that the noise data is concentrated are abnormal data.
Optionally, the optimization module includes:
Position vector determines submodule, for using the field parameter of the DBSCAN clustering algorithm as krill group position
Set vector;
Computational submodule, for calculating the fitness of the krill group individual according to the individual position vector of krill group
Value;
First determines submodule, for determining that worst krill is individual and optimal according to the fitness value of the krill group individual
Krill individual;
Implementation sub-module is lured for executing by the krill group for removing the worst krill individual and the optimal krill individual
Movement, foraging activity and the STOCHASTIC DIFFUSION movement led;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach the krill group algorithm;
Submodule is returned to, for returning to calculating when the maximum number of iterations of the not up to described krill group algorithm
Module executes the step of calculating the fitness value of the krill group individual according to the individual position vector of krill group;
Second determines submodule, for when reaching the maximum number of iterations of krill group's algorithm, will it is updated respectively
Field parameter of the individual position vector of krill group as the DBSCAN clustering algorithm after the optimization.
Optionally, the computational submodule includes:
Computing unit, for according to formula
Calculate the krill group individual
Fitness value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector
For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group,
Euclidean distance of the dist (x, x') between x and x', ε is constant.
The application also provides a kind of anomaly data detection equipment, which includes:
Memory, for storing computer program;
Processor realizes the method for the anomaly data detection as described in any of the above-described when for executing the computer program
The step of.
The application also provides a kind of computer readable storage medium, and calculating is stored on the computer readable storage medium
Machine program realizes the step of the method for anomaly data detection as described in any of the above-described when the computer program is executed by processor
Suddenly.
The method of anomaly data detection provided herein, comprising: utilize krill group's algorithm optimization DBSCAN clustering algorithm
Field parameter;Obtain pending data;Pending data is marked based on the DBSCAN clustering algorithm after optimization, is obtained
Clustering cluster data set and noise data collection;Determine that the data that noise data is concentrated are abnormal data.
Technical solution provided herein, by being carried out based on the DBSCAN clustering algorithm after optimization to pending data
Label, obtains clustering cluster data set and noise data collection, finally determines that the data that noise data is concentrated are abnormal data, greatly
The Clustering Effect to traffic data is enhanced, freeway traffic flow data complicated and changeable and strong stochastic volatility are being faced
When, it can accurately detect abnormal traffic data, improve the efficiency and accuracy of abnormal traffic Data Detection.The application is same
When additionally provide the system, equipment and computer readable storage medium of a kind of anomaly data detection, have above-mentioned beneficial effect,
This is repeated no more.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of the method for anomaly data detection provided by the embodiment of the present application;
Fig. 2 is a kind of process of practical manifestation mode of S101 in a kind of method of anomaly data detection provided by Fig. 1
Figure;
Fig. 3 is a kind of structure chart of the system of anomaly data detection provided by the embodiment of the present application;
Fig. 4 is the structure chart of the system of another kind anomaly data detection provided by the embodiment of the present application;
Fig. 5 is a kind of structure chart of anomaly data detection equipment provided by the embodiment of the present application.
Specific embodiment
The core of the application is to provide method, system, equipment and the computer-readable storage medium of a kind of anomaly data detection
Matter, for accurately detecting its traffic flow abnormal data.
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Referring to FIG. 1, Fig. 1 is a kind of flow chart of the method for anomaly data detection provided by the embodiment of the present application.
It specifically comprises the following steps:
S101: the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm is utilized;
Property complicated and changeable and stochastic volatility based on freeway traffic flow data are strong, and lead to traditional abnormal number
It is difficult accurately to detect its traffic flow abnormal data according to recognition methods, this application provides a kind of method of anomaly data detection,
For solving the above problems;
The application is marked pending data by DBSCAN clustering algorithm, and DBSCAN is on the basis of packing density
The set of data samples of input is drawn classification according to the tightness degree between each data by a kind of clustering algorithm for dividing class,
DBSCAN density clustering algorithm effectively can accurately classify to expressway traffic data and isolate exceptional sample, from
And detect its abnormal traffic data.
However, DBSCAN clustering algorithm is very sensitive to the setting of field parameter (E, MinPts), if specified not
When the algorithm will cause the decline of clustering result quality, therefore the application optimizes the neck of DBSCAN clustering algorithm by krill group's algorithm
Field parameter, krill group's algorithm is a kind of novel swarm intelligence algorithm for imitating krill group survival activity, can be efficiently solved
Extensive benchmark optimization problem, can optimize the field parameter of DBSCAN clustering algorithm, improve the cluster of DBSCAN clustering algorithm
Quality, and then improve the efficiency and accuracy of abnormal traffic Data Detection.
S102: pending data is obtained;
Pending data mentioned herein be input expressway traffic sample set, the expressway traffic sample set it is defeated
Enter mode and be specifically as follows user to be manually entered, or be connected to predetermined server and be downloaded, the application is to the input
Mode is not especially limited.
S103: pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains clustering cluster data set
And noise data collection;
Optionally, pending data is marked in the DBSCAN clustering algorithm mentioned herein based on after optimization, obtains
Clustering cluster data set and noise data collection, are specifically as follows:
S1031: the expressway traffic sample set D=(x of input is received1,x2,...,xm), and determining Neighbourhood parameter (E,
MinPts);
S1032: setting kernel object set A is divided into the quantity k of clustering cluster, the data acquisition system B that is not traversed also and
Cluster divides C, and is initialized, i.e.,
S1033: it finds out kernel object all in expressway traffic sample set D and passes through distance first if j=1,2...m
Metric form searches out the E- neighborhood subsample collection NE (x of traffic flow sample xjj), if meeting | NE (xj) | >=MinPts, then
xjPut kernel object set A, i.e. A=A ∪ { x intoj};
S1034: judge whether kernel object set A is empty;
IfThen algorithm terminates, and executes step S1038 at this time, ifThen follow the steps S1035;
S1035: appoint and take a ∈ A, while kernel object queue Acur={ a }, k=k+1 are set, this moment clustering cluster data set
Ck={ a } updates B=B- { a };
S1036: judge whether kernel object queue is empty;
IfThen update C={ C1,C2,...,CkAnd A=A-Ck, and return to step S1034;IfThen follow the steps S1037;
S1037: appointing and take a' ∈ Acur, updates N according to E- neighborhood definitionE(a'), if Δ=NE(a') ∩ B updates Ck with this
=Ck ∪Δ, then B=B- Δ and Acur=Acur ∪ (Δ ∩ A)-a' return to step S1036;
S1038: a clustering cluster C={ C is exported1,C2,...,CkAnd abnormal data cluster D-C.
The embodiment of the present application accurately classifies point to expressway traffic data by DBSCAN density clustering algorithm
Separate out exceptional sample, have the stronger speed of service, accuracy rate and recall ratio, shown when detecting abnormal traffic data compared with
Good stability and validity.
S104: determine that the data that noise data is concentrated are abnormal data.
Based on the above-mentioned technical proposal, the method for a kind of anomaly data detection provided herein, after based on optimization
DBSCAN clustering algorithm pending data is marked, obtain clustering cluster data set and noise data collection, finally determination make an uproar
Data in sound data set are abnormal data, greatly enhance the Clustering Effect to traffic data, in face of it is complicated and changeable with
And when the strong freeway traffic flow data of stochastic volatility, it can accurately detect abnormal traffic data, improve exception
The efficiency and accuracy of traffic data detection.
It is directed to the step S101 of an embodiment, wherein described calculated using krill group's algorithm optimization DBSCAN cluster
The field parameter of method, step that specifically can also be as shown in Figure 2 are illustrated below with reference to Fig. 2.
Referring to FIG. 2, a kind of practical manifestation of the Fig. 2 for S103 in a kind of method of anomaly data detection provided by Fig. 1
The flow chart of mode.
Itself specifically includes the following steps:
S201: using the field parameter of DBSCAN clustering algorithm as the individual position vector of krill group;
For example, the position x of krill individual ii=(xi1,xi2), wherein xi1Represent E, and xi1∈[REdown,REup], xi2It represents
MinPts, and xi2∈[RMdown,RMup],
S202: the fitness value of krill group's individual is calculated according to the individual position vector of krill group;
Optionally, the fitness value mentioned herein that krill group's individual is calculated according to the individual position vector of krill group, tool
Body can be with are as follows:
According to formula
Calculate the suitable of krill group's individual
Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector
For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in krill group, dist
The Euclidean distance of (x, x') between x and x', ε are constant.
S203: worst krill individual and optimal krill individual are determined according to the fitness value of krill group's individual;
It optionally, can be by the maximum krill group individual of fitness value as optimal krill individual, by fitness value minimum
Krill group individual as worst krill individual.
S204: movement, the foraging activity of krill group's induction by removing worst krill individual and optimal krill individual are executed
And STOCHASTIC DIFFUSION movement;
Optionally, the fortune mentioned herein for executing krill group's induction by removing worst krill individual and optimal krill individual
Dynamic, foraging activity and STOCHASTIC DIFFUSION movement, can specifically include following steps:
S2041: according to formula
It executes by removing worst krill
The movement of the krill of body and optimal krill individual group's induction;
Wherein, NmaxFor maximum induced velocity, wnFor the mobile inertia weight in position caused in [0,1] range, Ni old
For it is preceding it is primary alternate movement induction moving direction position,For the best food direction induction that provides of individual in krill group,
XiFor the fitness or target function value of i-th of individual current iteration in krill group, XjFor j-th of (j=1,2 ..., N) neighbour
Fitness, ε be infinitesimal.
S2042: according to formula Fi=Vfβi+wfFi oldExecute foraging activity;
Wherein, VfFor speed of looking for food, wfInertia weight for movement of looking for food and the value in [0,1] range, Fi oldIt is last time
Last movement of looking for food, βiFor direction of looking for food, and βiAccording to formula βi=βi food+βi bestIt is calculated, whereinRefer to food
The attraction of object, according to formulaWithIt is calculated, andThen
It is i-th of individual best situation of fitness so far in krill group, according to formulaIt is calculated.
S2042: according to formulaCarry out STOCHASTIC DIFFUSION;
Wherein, DmaxTo maximally diffuse speed, δ is random direction vector, I and ImaxBe respectively current krill group more repeatly
The several and pre-set global maximum change number constant of krill group.
S205: each individual position vector of krill group is updated;
Optionally, mentioned herein that each individual position vector of krill group is updated, it is specifically as follows:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length, can be according to formulaIt is calculated, wherein CtIt is limited
Step factor processed is the constant between [0,2], the C of low valuetIt can allow the careful search space of krill individual.
Optionally, according to formula
Before updating each individual position vector of krill group, it can also include the following steps:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo the individual position vector element of each krill group into
Row variation operation;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor
M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability,
xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
S206: judge whether the maximum number of iterations for reaching krill group's algorithm;
If so, entering step S207;If it is not, then returning to step S202;
S207: join using updated each individual position vector of krill group as the field of the DBSCAN clustering algorithm after optimization
Number.
Referring to FIG. 3, Fig. 3 is a kind of structure chart of the system of anomaly data detection provided by the embodiment of the present application.
The system may include:
Optimization module 100, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module 200 is obtained, for obtaining pending data;
Mark module 300 is gathered for pending data to be marked based on the DBSCAN clustering algorithm after optimization
Class cluster data collection and noise data collection;
Determining module 400, for determining that the data that noise data is concentrated are abnormal data.
Referring to FIG. 4, Fig. 4 is the structure chart of the system of another kind anomaly data detection provided by the embodiment of the present application.
The optimization module 100 may include:
Position vector determines submodule, for using the field parameter of DBSCAN clustering algorithm as krill group body position to
Amount;
Computational submodule, for calculating the fitness value of krill group's individual according to the individual position vector of krill group;
First determines submodule, for determining worst krill individual and optimal krill according to the fitness value of krill group's individual
Individual;
Implementation sub-module, for executing the fortune of krill group's induction by removing worst krill individual and optimal krill individual
Dynamic, foraging activity and STOCHASTIC DIFFUSION movement;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach krill group's algorithm;
Submodule is returned, for computational submodule being returned and being executed when the not up to maximum number of iterations of krill group algorithm
The step of calculating the fitness value of krill group's individual according to the individual position vector of krill group;
Second determine submodule, for when reach krill group algorithm maximum number of iterations when, by updated each krill
Field parameter of the individual position vector of group as the DBSCAN clustering algorithm after optimization.
Further, which may include:
Computing unit, for according to formula
Calculate the suitable of krill group's individual
Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by with krill individual current location vector
For k clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in krill group, dist
The Euclidean distance of (x, x') between x and x', ε are constant.
The update submodule may include:
Updating unit, for according to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
The update submodule can also include:
Crossing operation unit, for according to formulaTo each individual position vector element of krill group
Carry out crossing operation;
Mutation operator unit, for according to formulaTo each krill group
Body position vector element carries out mutation operator;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mFor
M group position vector element carries out the probability of crossing operation or mutation operator in i-th of krill individual, and Mu is mutation probability,
xgbes,mThe optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
Since the embodiment of components of system as directed is corresponded to each other with the embodiment of method part, the embodiment of components of system as directed is asked
Referring to the description of the embodiment of method part, wouldn't repeat here.
Referring to FIG. 5, Fig. 5 is a kind of structure chart of anomaly data detection equipment provided by the embodiment of the present application.
The anomaly data detection equipment 500 can generate bigger difference because configuration or performance are different, may include one
A or more than one processor (central processing units, CPU) 522 is (for example, one or more are handled
Device) and memory 532, one or more storage application programs 542 or data 544 storage medium 530 (such as one or
More than one mass memory unit).Wherein, memory 532 and storage medium 530 can be of short duration storage or persistent storage.It deposits
Storage may include one or more modules (diagram does not mark) in the program of storage medium 530, and each module may include
To the series of instructions operation in device.Further, central processing unit 522 can be set to communicate with storage medium 530,
The series of instructions operation in storage medium 530 is executed in anomaly data detection equipment 500.
Anomaly data detection equipment 500 can also include one or more power supplys 525, one or more are wired
Or radio network interface 550, one or more input/output interfaces 558, and/or, one or more operating systems
541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in the method for anomaly data detection described in above-mentioned Fig. 1 to Fig. 2 is based on by anomaly data detection equipment
The structure shown in fig. 5 is realized.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device, device and method, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the division of module,
Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple module or components can be with
In conjunction with or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING of device or module or
Communication connection can be electrical property, mechanical or other forms.
Module may or may not be physically separated as illustrated by the separation member, show as module
Component may or may not be physical module, it can and it is in one place, or may be distributed over multiple networks
In module.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, can integrate in a processing module in each functional module in each embodiment of the application
It is that modules physically exist alone, can also be integrated in two or more modules in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.
If integrated module is realized and when sold or used as an independent product in the form of software function module, can
To be stored in a computer readable storage medium.Based on this understanding, the technical solution of the application substantially or
Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products
Out, which is stored in a storage medium, including some instructions are used so that a computer equipment
The whole of (can be personal computer, funcall device or the network equipment etc.) execution each embodiment method of the application
Or part steps.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory,
ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. are various can store program
The medium of code.
Above to a kind of method of anomaly data detection provided herein, system, equipment and computer-readable storage
Medium is described in detail.Specific examples are used herein to illustrate the principle and implementation manner of the present application, with
The explanation of upper embodiment is merely used to help understand the present processes and its core concept.It should be pointed out that being led for this technology
For the those of ordinary skill in domain, under the premise of not departing from the application principle, can also to the application carry out it is several improvement and
Modification, these improvement and modification are also fallen into the protection scope of the claim of this application.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or equipment for including element.
Claims (10)
1. a kind of method of anomaly data detection characterized by comprising
Utilize the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm;
Obtain pending data;
The pending data is marked based on the DBSCAN clustering algorithm after optimization, obtain clustering cluster data set and
Noise data collection;
Determine that the data that the noise data is concentrated are abnormal data.
2. the method according to claim 1, wherein described utilize krill group's algorithm optimization DBSCAN clustering algorithm
Field parameter, comprising:
Using the field parameter of the DBSCAN clustering algorithm as the individual position vector of krill group;
The fitness value of the krill group individual is calculated according to the individual position vector of krill group;
Worst krill individual and optimal krill individual are determined according to the fitness value of the krill group individual;
Execute the movement induced by the krill group for removing the worst krill individual and the optimal krill individual, foraging activity with
And STOCHASTIC DIFFUSION movement;
Each individual position vector of krill group is updated;
Judge whether the maximum number of iterations for reaching the krill group algorithm;
If it is not, then returning to the step for executing the fitness value for calculating the krill group individual according to the individual position vector of krill group
Suddenly;
If so, using updated each individual position vector of krill group as the DBSCAN clustering algorithm after the optimization
Field parameter.
3. according to the method described in claim 2, it is characterized in that, calculating the phosphorus according to the individual position vector of krill group
The fitness value of shoal of shrimps individual, comprising:
According to formula
Calculate the suitable of the krill group individual
Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by being neck with krill individual current location vector
K clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, dist
The Euclidean distance of (x, x') between x and x', ε are constant.
4. according to the method described in claim 2, it is characterized in that, be updated to each individual position vector of krill group,
Include:
According to formulaUpdate each individual position vector of krill group;
Wherein, Δ t is to update step-length.
5. according to the method described in claim 4, it is characterized in that, according to formulaUpdate each institute
Before stating the individual position vector of krill group, further includes:
According to formulaCrossing operation is carried out to each individual position vector element of krill group;
According to formulaTo each individual position vector element of krill group into
Row variation operation;
Wherein, xi,mFor the m group position vector element in i-th of krill individual, Cr is the threshold value of crossover probability, Ri,mIt is i-th
M group position vector element carries out the probability of crossing operation or mutation operator in a krill individual, and Mu is mutation probability, xgbes,m
The optimal position vector element of current iteration, xp,m、xq,mIt is the position vector element randomly selected, μ is constant.
6. a kind of system of anomaly data detection characterized by comprising
Optimization module, for the field parameter using krill group's algorithm optimization DBSCAN clustering algorithm;
Module is obtained, for obtaining pending data;
Mark module is obtained for the pending data to be marked based on the DBSCAN clustering algorithm after optimization
Clustering cluster data set and noise data collection;
Determining module, for determining that the data that the noise data is concentrated are abnormal data.
7. system according to claim 6, which is characterized in that the optimization module includes:
Position vector determines submodule, for using the field parameter of the DBSCAN clustering algorithm as krill group body position to
Amount;
Computational submodule, for calculating the fitness value of the krill group individual according to the individual position vector of krill group;
First determines submodule, for determining worst krill individual and optimal krill according to the fitness value of the krill group individual
Individual;
Implementation sub-module, for executing by krill group's induction of the removing worst krill individual and the optimal krill individual
Movement, foraging activity and STOCHASTIC DIFFUSION movement;
Submodule is updated, for being updated to each individual position vector of krill group;
Judging submodule, the maximum number of iterations for judging whether to reach the krill group algorithm;
Submodule is returned to, for returning to the computational submodule when the maximum number of iterations of the not up to described krill group algorithm
Execute the step of fitness value of the krill group individual is calculated according to the krill group individual position vector;
Second determines submodule, for when reaching the maximum number of iterations of krill group's algorithm, will it is updated it is each described in
Field parameter of the individual position vector of krill group as the DBSCAN clustering algorithm after the optimization.
8. system according to claim 7, which is characterized in that the computational submodule includes:
Computing unit, for according to formula
Calculate the suitable of the krill group individual
Answer angle value;
Wherein, f (x) is krill individual adaptation degree function, D1,D2,...,DkIt is by being neck with krill individual current location vector
K clustering cluster after the DBSCAN Density Clustering of field parameter, x and x' are position vector individual in the krill group, dist
The Euclidean distance of (x, x') between x and x', ε are constant.
9. a kind of anomaly data detection equipment characterized by comprising
Memory, for storing computer program;
Processor, realizing the anomaly data detection as described in any one of claim 1 to 5 when for executing the computer program
The step of method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program realizes the side of the anomaly data detection as described in any one of claim 1 to 5 when the computer program is executed by processor
The step of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910595139.8A CN110298407B (en) | 2019-07-03 | 2019-07-03 | Abnormal data detection method, system and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910595139.8A CN110298407B (en) | 2019-07-03 | 2019-07-03 | Abnormal data detection method, system and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110298407A true CN110298407A (en) | 2019-10-01 |
CN110298407B CN110298407B (en) | 2023-05-09 |
Family
ID=68030056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910595139.8A Active CN110298407B (en) | 2019-07-03 | 2019-07-03 | Abnormal data detection method, system and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110298407B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955737A (en) * | 2023-09-19 | 2023-10-27 | 源康(东阿)健康科技有限公司 | Abnormal characteristic retrieval method used in gelatin production process |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368858A (en) * | 2017-07-28 | 2017-11-21 | 中南大学 | A kind of parametrization measurement multi-model intelligent method for fusing of intelligent environment carrying robot identification floor |
CN109669990A (en) * | 2018-11-16 | 2019-04-23 | 重庆邮电大学 | A kind of innovatory algorithm carrying out Outliers mining to density irregular data based on DBSCAN |
CN109766393A (en) * | 2018-12-06 | 2019-05-17 | 中科恒运股份有限公司 | Abnormal deviation data examination method and device |
-
2019
- 2019-07-03 CN CN201910595139.8A patent/CN110298407B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368858A (en) * | 2017-07-28 | 2017-11-21 | 中南大学 | A kind of parametrization measurement multi-model intelligent method for fusing of intelligent environment carrying robot identification floor |
CN109669990A (en) * | 2018-11-16 | 2019-04-23 | 重庆邮电大学 | A kind of innovatory algorithm carrying out Outliers mining to density irregular data based on DBSCAN |
CN109766393A (en) * | 2018-12-06 | 2019-05-17 | 中科恒运股份有限公司 | Abnormal deviation data examination method and device |
Non-Patent Citations (5)
Title |
---|
JIAN ZHANG ET AL.: "Traffic Features Extraction and Clustering Analysis for Abnormal Behavior Detection", 《 INTELLIGENT INFORMATION PROCESSING》 * |
SINGH V ET AL.: "Krill Herd Clustering Algorithm using DBSCAN Technique", 《 INTERNATIONAL JOURNAL OF COMPUTER SCIENCE & ENGINEERING TECHNOLOGY》 * |
刘沛: "磷虾群优化算法的改进及应用", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
蒋华 等: "基于DCNDA算法的数据异常检测", 《计算机工程与设计》 * |
阮嘉琨 等: "基于DBSCAN密度聚类算法的高速公路交通流异常数据检测", 《工业控制计算机》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955737A (en) * | 2023-09-19 | 2023-10-27 | 源康(东阿)健康科技有限公司 | Abnormal characteristic retrieval method used in gelatin production process |
CN116955737B (en) * | 2023-09-19 | 2023-11-28 | 源康(东阿)健康科技有限公司 | Abnormal characteristic retrieval method used in gelatin production process |
Also Published As
Publication number | Publication date |
---|---|
CN110298407B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Evolutionary clustering of moving objects | |
US7724784B2 (en) | System and method for classifying data streams using high-order models | |
CN102289522B (en) | Method of intelligently classifying texts | |
US20120041979A1 (en) | Method for generating context hierarchy and system for generating context hierarchy | |
CN111008337B (en) | Deep attention rumor identification method and device based on ternary characteristics | |
CN109189747B (en) | Spark big data platform-based user behavior habit analysis method for storage battery car | |
CN113344128B (en) | Industrial Internet of things self-adaptive stream clustering method and device based on micro clusters | |
CN107579846B (en) | Cloud computing fault data detection method and system | |
CN109657147A (en) | Microblogging abnormal user detection method based on firefly and weighting extreme learning machine | |
CN111027743A (en) | OD optimal path searching method and device based on hierarchical road network | |
CN110390816A (en) | A kind of condition discrimination method based on multi-model fusion | |
CN109492596A (en) | A kind of pedestrian detection method and system based on K-means cluster and region recommendation network | |
CN109740052A (en) | The construction method and device of network behavior prediction model, network behavior prediction technique | |
CN110298407A (en) | A kind of method of anomaly data detection, system and equipment | |
CN109376790A (en) | A kind of binary classification method based on Analysis of The Seepage | |
CN111553566A (en) | Method for defining service range of urban public service facility | |
Gias et al. | Samplehst: Efficient on-the-fly selection of distributed traces | |
Chandio et al. | Towards adaptable and tunable cloud-based map-matching strategy for GPS trajectories | |
CN108280210B (en) | Traffic route determination method and system based on firework algorithm | |
CN105354585A (en) | Improved cat swarm algorithm based target extraction and classification method | |
CN109981630A (en) | Intrusion detection method and system based on Chi-square Test and LDOF algorithm | |
CN109739840A (en) | Data processing empty value method, apparatus and terminal device | |
CN109583574A (en) | A kind of high-precision Network Intrusion Detection System | |
CN115292303A (en) | Data processing method and device | |
CN114168733A (en) | Method and system for searching rules based on complex network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |