CN110298407A

CN110298407A - A kind of method of anomaly data detection, system and equipment

Info

Publication number: CN110298407A
Application number: CN201910595139.8A
Authority: CN
Inventors: 蔡延光; 阮嘉琨; 蔡颢
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2019-10-01
Anticipated expiration: 2039-07-03
Also published as: CN110298407B

Abstract

This application discloses a kind of methods of anomaly data detection, comprising: utilizes the field parameter of krill group's algorithm optimization DBSCAN clustering algorithm；Obtain pending data；Pending data is marked based on the DBSCAN clustering algorithm after optimization, obtains clustering cluster data set and noise data collection；Determine that the data that noise data is concentrated are abnormal data.Technical solution provided herein, greatly enhance the Clustering Effect to traffic data, when facing freeway traffic flow data complicated and changeable and strong stochastic volatility, it can accurately detect abnormal traffic data, improve the efficiency and accuracy of abnormal traffic Data Detection.The application additionally provides the system, equipment and computer readable storage medium of a kind of anomaly data detection simultaneously, has above-mentioned beneficial effect.

Description

A method, system and device for abnormal data detection

技术领域technical field

本申请涉及异常数据检测领域，特别涉及一种异常数据检测的方法、系统、设备及计算机可读存储介质。The present application relates to the field of abnormal data detection, and in particular, to a method, system, device and computer-readable storage medium for abnormal data detection.

背景技术Background technique

在高速公路智能交通系统中，异常数据是影响数据质量的重要因素，因此对异常交通数据的查找并清洗是高速公路交通数据质量优化的重要步骤。异常交通数据在交通大数据中是小众的部分，其数值大小、发展趋势已经数据曲线形态都与占大比例的正常交通数据有一定程度的差距。In the highway intelligent transportation system, abnormal data is an important factor affecting the data quality, so the search and cleaning of abnormal traffic data is an important step in the optimization of highway traffic data quality. Abnormal traffic data is a small part of traffic big data, and its numerical value, development trend, and data curve shape are all different from the normal traffic data that accounts for a large proportion.

由于高速公路交通流数据的复杂多变性以及随机波动性强，而导致传统的异常数据识别方法很难准确检测出其交通流异常数据。Due to the complexity and variability of highway traffic flow data and strong random fluctuations, it is difficult for traditional abnormal data identification methods to accurately detect abnormal traffic flow data.

因此，如何准确检测出交通流异常数据是本领域技术人员目前需要解决的技术问题。Therefore, how to accurately detect abnormal traffic flow data is a technical problem that needs to be solved by those skilled in the art.

发明内容SUMMARY OF THE INVENTION

本申请的目的是提供一种异常数据检测的方法、系统、设备及计算机可读存储介质，用于准确检测出其交通流异常数据。The purpose of the present application is to provide a method, system, device and computer-readable storage medium for abnormal data detection, which are used to accurately detect abnormal traffic flow data.

为解决上述技术问题，本申请提供一种异常数据检测的方法，该方法包括：In order to solve the above-mentioned technical problems, the present application provides a method for abnormal data detection, the method comprising:

利用磷虾群算法优化DBSCAN聚类算法的领域参数；Using krill swarm algorithm to optimize the domain parameters of DBSCAN clustering algorithm;

获取待处理数据；Get data to be processed;

基于优化后的所述DBSCAN聚类算法对所述待处理数据进行标记，得到聚类簇数据集及噪声数据集；Marking the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

确定所述噪声数据集中的数据为异常数据。It is determined that the data in the noise data set is abnormal data.

可选的，所述利用磷虾群算法优化DBSCAN聚类算法的领域参数，包括：Optionally, the optimization of the field parameters of the DBSCAN clustering algorithm using the krill swarm algorithm includes:

将所述DBSCAN聚类算法的领域参数作为磷虾群个体位置向量；Taking the field parameters of the DBSCAN clustering algorithm as the krill group individual position vector;

根据所述磷虾群个体位置向量计算所述磷虾群个体的适应度值；Calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

根据所述磷虾群个体的适应度值确定最差磷虾个体及最优磷虾个体；Determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

执行由除去所述最差磷虾个体及所述最优磷虾个体的磷虾群诱导的运动、觅食活动以及随机扩散运动；performing locomotion, foraging activity, and random dispersal locomotion induced by the krill colony excluding the worst krill individual and the optimal krill individual;

对各所述磷虾群个体位置向量进行更新；updating the individual position vectors of each of the krill groups;

判断是否达到所述磷虾群算法的最大迭代次数；judging whether the maximum number of iterations of the krill swarm algorithm is reached;

若否，则返回执行根据所述磷虾群个体位置向量计算所述磷虾群个体的适应度值的步骤；If not, returning to the step of calculating the fitness value of the krill group individual according to the position vector of the krill group individual;

若是，则将更新后的各所述磷虾群个体位置向量作为所述优化后的DBSCAN聚类算法的领域参数。If so, use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm.

可选的，根据所述磷虾群个体位置向量计算所述磷虾群个体的适应度值，包括：Optionally, calculating the fitness value of the krill group individual according to the position vector of the krill group individual includes:

根据公式According to the formula

计算所述磷虾群个体的适应度值； calculating the fitness value of the krill group individuals;

其中，f(x)为磷虾个体适应度函数，D₁,D₂,...,D_k是经过以磷虾个体当前位置向量为领域参数的DBSCAN密度聚类后的k个聚类簇，x和x'均为所述磷虾群中个体的位置向量，dist(x,x')为x与x'之间的欧氏距离，ε为常数。Among them, f(x) is the krill individual fitness function, D ₁ , D ₂ ,..., D _k is the k clusters after the DBSCAN density clustering with the krill individual current position vector as the domain parameter , x and x' are the position vectors of individuals in the krill group, dist(x, x') is the Euclidean distance between x and x', and ε is a constant.

可选的，对各所述磷虾群个体位置向量进行更新，包括：Optionally, updating the individual position vectors of the krill groups, including:

根据公式更新各所述磷虾群个体位置向量；According to the formula updating the individual position vectors of each of the krill groups;

其中，Δt为更新步长。Among them, Δt is the update step size.

可选的，在根据公式optional, in accordance with the formula

更新各所述磷虾群个体位置向量之前，还包括： Before updating the individual position vector of each of the krill groups, it also includes:

根据公式对各所述磷虾群个体位置向量元素进行交叉运算；According to the formula performing a crossover operation on the individual position vector elements of each of the krill groups;

根据公式对各所述磷虾群个体位置向量元素进行变异运算；According to the formula performing mutation operation on the individual position vector elements of each of the krill groups;

其中，x_i,m为第i个磷虾个体中的第m组位置向量元素，Cr为交叉概率的阈值，R_i,m为第i个磷虾个体中第m组位置向量元素进行交叉运算或变异运算的概率，Mu为突变概率，x_gbes,m当前迭代最优的位置向量元素，x_p,m、x_q,m均为随机抽取的位置向量元素，μ为常数。Among them, x _i,m is the m-th group of position vector elements in the i-th krill individual, Cr is the threshold of the crossover probability, and R _i,m is the m-th group of position vector elements in the i-th krill individual for crossover operation Or the probability of mutation operation, Mu is the mutation probability, x _gbes,m is the optimal position vector element for the current iteration, x _p,m , x _q,m are randomly selected position vector elements, and μ is a constant.

本申请还提供一种异常数据检测的系统，该系统包括：The present application also provides a system for detecting abnormal data, the system comprising:

优化模块，用于利用磷虾群算法优化DBSCAN聚类算法的领域参数；The optimization module is used to optimize the domain parameters of the DBSCAN clustering algorithm using the krill swarm algorithm;

获取模块，用于获取待处理数据；The acquisition module is used to acquire the data to be processed;

标记模块，用于基于优化后的所述DBSCAN聚类算法对所述待处理数据进行标记，得到聚类簇数据集及噪声数据集；a labeling module for labeling the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

确定模块，用于确定所述噪声数据集中的数据为异常数据。A determination module, configured to determine that the data in the noise data set is abnormal data.

可选的，所述优化模块包括：Optionally, the optimization module includes:

位置向量确定子模块，用于将所述DBSCAN聚类算法的领域参数作为磷虾群个体位置向量；a position vector determination submodule, used for taking the field parameter of the DBSCAN clustering algorithm as the krill group individual position vector;

计算子模块，用于根据所述磷虾群个体位置向量计算所述磷虾群个体的适应度值；a calculation submodule, configured to calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

第一确定子模块，用于根据所述磷虾群个体的适应度值确定最差磷虾个体及最优磷虾个体；a first determination submodule, configured to determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

执行子模块，用于执行由除去所述最差磷虾个体及所述最优磷虾个体的磷虾群诱导的运动、觅食活动以及随机扩散运动；an execution sub-module for executing movement, foraging activity and random diffusion movement induced by the krill swarm excluding the worst krill individual and the optimal krill individual;

更新子模块，用于对各所述磷虾群个体位置向量进行更新；an update submodule for updating the individual position vectors of the krill groups;

判断子模块，用于判断是否达到所述磷虾群算法的最大迭代次数；a judging submodule for judging whether the maximum number of iterations of the krill swarm algorithm is reached;

返回子模块，用于当未达到所述磷虾群算法的最大迭代次数时，返回所述计算子模块执行根据所述磷虾群个体位置向量计算所述磷虾群个体的适应度值的步骤；A return sub-module, configured to return to the calculation sub-module to perform the step of calculating the fitness value of the krill group individual according to the position vector of the krill group individual when the maximum number of iterations of the krill group algorithm has not been reached ;

第二确定子模块，用于当达到所述磷虾群算法的最大迭代次数时，将更新后的各所述磷虾群个体位置向量作为所述优化后的DBSCAN聚类算法的领域参数。The second determination sub-module is configured to use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm when the maximum number of iterations of the krill group algorithm is reached.

可选的，所述计算子模块包括：Optionally, the calculation submodule includes:

计算单元，用于根据公式Calculation unit, used for formulas

本申请还提供一种异常数据检测设备，该异常数据检测设备包括：The application also provides an abnormal data detection device, the abnormal data detection device includes:

存储器，用于存储计算机程序；memory for storing computer programs;

处理器，用于执行所述计算机程序时实现如上述任一项所述异常数据检测的方法的步骤。The processor is configured to implement the steps of the method for detecting abnormal data according to any one of the above when executing the computer program.

本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上述任一项所述异常数据检测的方法的步骤。The present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the steps of the method for detecting abnormal data according to any one of the above.

本申请所提供异常数据检测的方法，包括：利用磷虾群算法优化DBSCAN聚类算法的领域参数；获取待处理数据；基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集；确定噪声数据集中的数据为异常数据。The method for detecting abnormal data provided by the present application includes: using the krill swarm algorithm to optimize the field parameters of the DBSCAN clustering algorithm; obtaining data to be processed; marking the data to be processed based on the optimized DBSCAN clustering algorithm to obtain cluster data set and noise data set; determine the data in the noise data set as abnormal data.

本申请所提供的技术方案，通过基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集，最后确定噪声数据集中的数据为异常数据，极大的增强了对交通数据的聚类效果，在面对复杂多变以及随机波动性强的高速公路交通流数据时，能够能准确检测出异常交通数据，提高了异常交通数据检测的效率及准确度。本申请同时还提供了一种异常数据检测的系统、设备及计算机可读存储介质，具有上述有益效果，在此不再赘述。The technical solution provided in this application, by marking the data to be processed based on the optimized DBSCAN clustering algorithm, obtains a cluster data set and a noise data set, and finally determines that the data in the noise data set is abnormal data, which greatly enhances the The clustering effect of traffic data can accurately detect abnormal traffic data in the face of complex, changeable and highly random fluctuation of expressway traffic flow data, and improve the efficiency and accuracy of abnormal traffic data detection. The present application also provides a system, device, and computer-readable storage medium for abnormal data detection, which have the above beneficial effects, and are not repeated here.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is an embodiment of the present application. For those of ordinary skill in the art, other drawings can also be obtained according to the provided drawings without any creative effort.

图1为本申请实施例所提供的一种异常数据检测的方法的流程图；1 is a flowchart of a method for detecting abnormal data provided by an embodiment of the present application;

图2为图1所提供的一种异常数据检测的方法中S101的一种实际表现方式的流程图；FIG. 2 is a flowchart of an actual representation of S101 in the method for detecting abnormal data provided in FIG. 1;

图3为本申请实施例所提供的一种异常数据检测的系统的结构图；3 is a structural diagram of a system for detecting abnormal data provided by an embodiment of the present application;

图4为本申请实施例所提供的另一种异常数据检测的系统的结构图；4 is a structural diagram of another abnormal data detection system provided by an embodiment of the present application;

图5为本申请实施例所提供的一种异常数据检测设备的结构图。FIG. 5 is a structural diagram of an abnormal data detection device provided by an embodiment of the present application.

具体实施方式Detailed ways

本申请的核心是提供一种异常数据检测的方法、系统、设备及计算机可读存储介质，用于准确检测出其交通流异常数据。The core of the present application is to provide a method, system, device and computer-readable storage medium for detecting abnormal data, which are used to accurately detect abnormal traffic flow data.

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

请参考图1，图1为本申请实施例所提供的一种异常数据检测的方法的流程图。Please refer to FIG. 1 , which is a flowchart of a method for detecting abnormal data provided by an embodiment of the present application.

其具体包括如下步骤：It specifically includes the following steps:

S101：利用磷虾群算法优化DBSCAN聚类算法的领域参数；S101: Use the krill swarm algorithm to optimize the domain parameters of the DBSCAN clustering algorithm;

基于高速公路交通流数据的复杂多变性以及随机波动性强，而导致传统的异常数据识别方法很难准确检测出其交通流异常数据，本申请提供了一种异常数据检测的方法，用于解决上述问题；Based on the complexity and variability of highway traffic flow data and strong random fluctuations, it is difficult for traditional abnormal data identification methods to accurately detect abnormal traffic flow data. This application provides a method for abnormal data detection, which is used to solve the problem of the above questions;

本申请通过DBSCAN聚类算法对待处理数据进行标记，DBSCAN是以数据密度为基准划分类的一种聚类算法，将输入的数据样本集，根据各数据之间的紧密程度来划类别，DBSCAN密度聚类算法能够有效地对高速路交通数据进行准确地分类而分离出异常样本，从而检测出其异常交通数据。This application marks the data to be processed through the DBSCAN clustering algorithm. DBSCAN is a clustering algorithm that divides classes based on data density. The clustering algorithm can effectively and accurately classify the highway traffic data and separate the abnormal samples, so as to detect the abnormal traffic data.

然而，DBSCAN聚类算法对领域参数(E，MinPts)的设置是非常敏感的，如果指定不当，该算法将造成聚类质量的下降，故本申请通过磷虾群算法来优化DBSCAN聚类算法的领域参数，磷虾群算法是一种模仿南极磷虾群生存活动的新型群智能算法，能够有效地解决广泛的基准优化问题，能够优化DBSCAN聚类算法的领域参数，提高DBSCAN聚类算法的聚类质量，进而提高异常交通数据检测的效率及准确度。However, the DBSCAN clustering algorithm is very sensitive to the setting of the domain parameters (E, MinPts). If it is not specified properly, the algorithm will cause the degradation of the clustering quality. Therefore, this application uses the krill swarm algorithm to optimize the DBSCAN clustering algorithm. Domain parameters, krill swarm algorithm is a new swarm intelligence algorithm that imitates the survival activities of Antarctic krill swarms. It can effectively solve a wide range of benchmark optimization problems, optimize the domain parameters of the DBSCAN clustering algorithm, and improve the clustering of the DBSCAN clustering algorithm. Class quality, thereby improving the efficiency and accuracy of abnormal traffic data detection.

S102：获取待处理数据；S102: obtain data to be processed;

这里提到的待处理数据即为输入的高速路交通样本集，该高速路交通样本集的输入方式具体可以为用户手动输入，也可以为连接到预设服务器进行下载，本申请对该输入方式不作具体限定。The data to be processed mentioned here is the input highway traffic sample set. The input method of the highway traffic sample set can be manually input by the user, or it can be downloaded by connecting to a preset server. There is no specific limitation.

S103：基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集；S103: Mark the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

可选的，这里提到的基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集，其具体可以为：Optionally, the optimized DBSCAN clustering algorithm mentioned here marks the data to be processed to obtain a cluster data set and a noise data set, which can be specifically:

S1031：接收输入的高速路交通样本集D＝(x₁,x₂,...,x_m)，并确定邻域参数(E，MinPts)；S1031: Receive the input highway traffic sample set D=(x ₁ , x ₂ , . . . , x _m ), and determine the neighborhood parameters (E, MinPts);

S1032：设置核心对象集合A，划分成聚类簇的数量k，还没被遍历的数据集合B以及簇划分C，并进行初始化，即 S1032: Set the core object set A, divide it into the number k of clusters, the data set B and the cluster division C that have not been traversed, and initialize, that is,

S1033：找出高速路交通样本集D中所有的核心对象，设j＝1,2...m，首先通过距离度量方式，搜索出交通流样本xj的E-邻域子样本集NE(x_j)，若满足|NE(x_j)|≥MinPts，则把x_j放进核心对象集合A，即A＝A∪{x_j}；S1033: Find out all the core objects in the expressway traffic sample set D, set j=1, 2... _j ), if |NE(x _j )|≥MinPts is satisfied, then put x _j into the core object set A, that is, A=A∪{x _j };

S1034：判断核心对象集合A是否为空；S1034: Determine whether the core object set A is empty;

若则算法结束，此时执行步骤S1038，若则执行步骤S1035；like Then the algorithm ends, and step S1038 is executed at this time, if Then execute step S1035;

S1035：任取a∈A，同时设置核心对象队列Acur＝{a}，k＝k+1，此刻聚类簇数据集Ck＝{a}，更新B＝B-{a}；S1035: arbitrarily select a∈A, and set the core object queue Acur={a}, k=k+1 at the same time, at this moment the cluster data set Ck={a}, update B=B-{a};

S1036：判断核心对象队列是否为空；S1036: Determine whether the core object queue is empty;

若则更新C＝{C₁,C₂,...,C_k}，以及A＝A-Ck，并返回执行步骤S1034；若则执行步骤S1037；like Then update C={C ₁ , C ₂ ,...,C _k }, and A=A-Ck, and return to step S1034; if Then execute step S1037;

S1037：任取a'∈Acur，根据E-邻域定义更新N_E(a')，设Δ＝N_E(a')∩B，以此更新Ck＝Ck∪_Δ，B＝B-Δ，以及Acur＝Acur∪(Δ∩A)-a'，然后返回执行步骤S1036；S1037: Select a'∈Acur arbitrarily, update N _E (a') according to the definition of _E -neighborhood, set Δ=NE (a')∩B, and update Ck=Ck∪ _Δ , B=B-Δ, And Acur=Acur∪(Δ∩A)-a', then return to step S1036;

S1038：输出个聚类簇C＝{C₁,C₂,...,C_k}以及异常数据簇D-C。S1038: Output the clusters C={C ₁ , C ₂ , . . . , C _k } and the abnormal data clusters DC.

本申请实施例通过DBSCAN密度聚类算法对高速路交通数据进行准确地分类而分离出异常样本，具有较强的运行速度、准确率以及查全率，在检测异常交通数据时表现出较好的稳定性和有效性。The embodiment of the present application uses the DBSCAN density clustering algorithm to accurately classify the highway traffic data to separate abnormal samples, has strong operating speed, accuracy and recall rate, and shows good performance in detecting abnormal traffic data. Stability and effectiveness.

S104：确定噪声数据集中的数据为异常数据。S104: Determine that the data in the noise data set is abnormal data.

基于上述技术方案，本申请所提供的一种异常数据检测的方法，通过基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集，最后确定噪声数据集中的数据为异常数据，极大的增强了对交通数据的聚类效果，在面对复杂多变以及随机波动性强的高速公路交通流数据时，能够能准确检测出异常交通数据，提高了异常交通数据检测的效率及准确度。Based on the above technical solution, in a method for detecting abnormal data provided by the present application, by marking the data to be processed based on the optimized DBSCAN clustering algorithm, a cluster data set and a noise data set are obtained, and finally the data set in the noise data set is determined. The data is abnormal data, which greatly enhances the clustering effect of traffic data. In the face of complex and changeable highway traffic flow data with strong random fluctuation, it can accurately detect abnormal traffic data and improve abnormal traffic flow. Efficiency and accuracy of data detection.

针对于上一实施例的步骤S101，其中所描述的利用磷虾群算法优化DBSCAN聚类算法的领域参数，其具体也可以如图2所示的步骤，下面结合图2进行说明。For step S101 of the previous embodiment, the described use of the krill swarm algorithm to optimize the domain parameters of the DBSCAN clustering algorithm may also be the steps shown in FIG. 2 , which will be described below with reference to FIG. 2 .

请参考图2，图2为图1所提供的一种异常数据检测的方法中S103的一种实际表现方式的流程图。Please refer to FIG. 2 . FIG. 2 is a flowchart of an actual representation of S103 in the abnormal data detection method provided in FIG. 1 .

其具体包括以下步骤：It specifically includes the following steps:

S201：将DBSCAN聚类算法的领域参数作为磷虾群个体位置向量；S201: take the domain parameters of the DBSCAN clustering algorithm as the krill group individual position vector;

例如，磷虾个体i的位置x_i＝(x_i1,x_i2)，其中x_i1代表E，且x_i1∈[R_Edown,R_Eup]，x_i2代表MinPts，且x_i2∈[R_Mdown,R_Mup]，For example, the position x _i =(x _i1 ,x _i2 ) of krill individual i, where x _i1 represents E, and x _i1 ∈ [R _Edown ,R _Eup ], x _i2 represents MinPts, and x _i2 ∈ [R _Mdown , R _Mup ],

S202：根据磷虾群个体位置向量计算磷虾群个体的适应度值；S202: Calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

可选的，这里提到的根据磷虾群个体位置向量计算磷虾群个体的适应度值，其具体可以为：Optionally, the calculation of the fitness value of the krill group individual according to the position vector of the krill group individual mentioned here may specifically be:

根据公式According to the formula

计算磷虾群个体的适应度值； Calculate the fitness value of krill group individuals;

其中，f(x)为磷虾个体适应度函数，D₁,D₂,...,D_k是经过以磷虾个体当前位置向量为领域参数的DBSCAN密度聚类后的k个聚类簇，x和x'均为磷虾群中个体的位置向量，dist(x,x')为x与x'之间的欧氏距离，ε为常数。Among them, f(x) is the krill individual fitness function, D ₁ , D ₂ ,..., D _k is the k clusters after the DBSCAN density clustering with the krill individual current position vector as the domain parameter , x and x' are the position vectors of individuals in the krill group, dist(x,x') is the Euclidean distance between x and x', and ε is a constant.

S203：根据磷虾群个体的适应度值确定最差磷虾个体及最优磷虾个体；S203: Determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

可选的，可以将适应度值最大的磷虾群个体作为最优磷虾个体，将适应度值最小的磷虾群个体作为最差磷虾个体。Optionally, the krill group individual with the largest fitness value may be regarded as the optimal krill individual, and the krill group individual with the smallest fitness value may be regarded as the worst krill individual.

S204：执行由除去最差磷虾个体及最优磷虾个体的磷虾群诱导的运动、觅食活动以及随机扩散运动；S204: Execute the movement, foraging activity and random diffusion movement induced by the krill group excluding the worst krill individual and the optimal krill individual;

可选的，这里提到的执行由除去最差磷虾个体及最优磷虾个体的磷虾群诱导的运动、觅食活动以及随机扩散运动，其具体可以包括如下步骤：Optionally, the execution of the movement, foraging activity and random diffusion movement induced by the krill group that removes the worst krill individual and the optimal krill individual mentioned here may specifically include the following steps:

S2041：根据公式S2041: According to the formula

执行由除去最差磷虾个体及最优磷虾个体的磷虾群诱导的运动； performing the movement induced by the krill swarm excluding the worst krill individuals and the best krill individuals;

其中，N^max为最大诱导速度，w_n为在[0,1]范围内引起的位置移动的惯性权重，N_i ^old为前一次更迭运动的诱导移动方向位置，为磷虾群中最好个体提供的食物方向诱导，X_i为磷虾群中第i个个体当前迭代的适应度或目标函数值，X_j为第j个(j＝1,2，...，N)邻居的适应度，ε为无穷小。Among them, N ^max is the maximum induced velocity, _wn is the inertia weight of the position movement caused in the range of [0,1], N _i ^old is the induced movement direction position of the previous changing movement, Food orientation induction for the best individual in the krill swarm, X _i is the fitness or objective function value of the current iteration of the ith individual in the krill swarm, X _j is the jth (j=1, 2, .. ., N) the fitness of neighbors, ε is infinitesimal.

S2042：根据公式F_i＝V_fβ_i+w_fF_i ^old执行觅食活动；S2042: Perform foraging activities according to the formula F _i =V _f β _i +w _f F _i ^old ;

其中，V_f为觅食速度，w_f为觅食运动的惯性重量且在[0,1]范围内取值，F_i ^old是上次最后的觅食动作，β_i为觅食方向，且β_i根据公式β_i＝β_i ^food+β_i ^best进行计算，其中是指食物的吸引力，根据公式和进行计算，而则是磷虾群中第i个个体目前为止适应度最好的情况，根据公式进行计算。Among them, V _f is the foraging speed, w _f is the inertia weight of the foraging movement and takes a value in the range of [0, 1], F _i ^old is the last foraging action, β _i is the foraging direction, and β _i is calculated according to the formula β _i =β _i ^food +β _i ^best , where refers to the attractiveness of a food, according to the formula and to calculate, and It is the case with the best fitness of the i-th individual in the krill group so far, according to the formula Calculation.

S2042：根据公式进行随机扩散；S2042: According to the formula conduct random diffusion;

其中，D_max为最大扩散速度，δ为随机方向向量，I与I_max分别是当前磷虾群的更迭次数以及预先设置的该磷虾群全局最大更迭次数常数。Among them, D _max is the maximum diffusion speed, δ is a random direction vector, and I and I _max are the current krill swarm change times and the preset global maximum change times constant of the krill swarm, respectively.

S205：对各磷虾群个体位置向量进行更新；S205: Update the individual position vectors of each krill group;

可选的，这里提到的对各磷虾群个体位置向量进行更新，其具体可以为：Optionally, the updating of the individual position vectors of each krill group mentioned here may specifically be:

根据公式更新各磷虾群个体位置向量；According to the formula Update the individual position vector of each krill group;

其中，Δt为更新步长，可以根据公式进行计算，其中C_t为限制步长因子，是[0，2]之间的常数，低值的C_t能够让磷虾个体仔细搜索空间。Among them, Δt is the update step size, which can be calculated according to the formula The calculation is performed, where C _t is the limiting step factor, which is a constant between [0, 2], and a low value of C _t allows krill individuals to carefully search the space.

可选的，在根据公式optional, in accordance with the formula

更新各磷虾群个体位置向量之前，还可以包括如下步骤： Before updating the individual position vector of each krill group, the following steps may also be included:

根据公式对各磷虾群个体位置向量元素进行交叉运算；According to the formula Perform a crossover operation on the individual position vector elements of each krill group;

根据公式对各磷虾群个体位置向量元素进行变异运算；According to the formula Perform mutation operation on individual position vector elements of each krill group;

S206：判断是否达到磷虾群算法的最大迭代次数；S206: Determine whether the maximum number of iterations of the krill swarm algorithm is reached;

若是，则进入步骤S207；若否，则返回执行步骤S202；If yes, go to step S207; if no, go back to step S202;

S207：将更新后的各磷虾群个体位置向量作为优化后的DBSCAN聚类算法的领域参数。S207: Use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm.

请参考图3，图3为本申请实施例所提供的一种异常数据检测的系统的结构图。Please refer to FIG. 3 , which is a structural diagram of a system for detecting abnormal data provided by an embodiment of the present application.

该系统可以包括：The system can include:

优化模块100，用于利用磷虾群算法优化DBSCAN聚类算法的领域参数；The optimization module 100 is used for optimizing the domain parameters of the DBSCAN clustering algorithm by using the krill swarm algorithm;

获取模块200，用于获取待处理数据；an acquisition module 200 for acquiring data to be processed;

标记模块300，用于基于优化后的DBSCAN聚类算法对待处理数据进行标记，得到聚类簇数据集及噪声数据集；The labeling module 300 is used for labeling the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

确定模块400，用于确定噪声数据集中的数据为异常数据。The determining module 400 is configured to determine that the data in the noise data set is abnormal data.

请参考图4，图4为本申请实施例所提供的另一种异常数据检测的系统的结构图。Please refer to FIG. 4 , which is a structural diagram of another abnormal data detection system provided by an embodiment of the present application.

该优化模块100可以包括：The optimization module 100 may include:

位置向量确定子模块，用于将DBSCAN聚类算法的领域参数作为磷虾群个体位置向量；The position vector determination sub-module is used to use the field parameters of the DBSCAN clustering algorithm as the position vector of the krill group individual;

计算子模块，用于根据磷虾群个体位置向量计算磷虾群个体的适应度值；The calculation submodule is used to calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

第一确定子模块，用于根据磷虾群个体的适应度值确定最差磷虾个体及最优磷虾个体；The first determination submodule is used to determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

执行子模块，用于执行由除去最差磷虾个体及最优磷虾个体的磷虾群诱导的运动、觅食活动以及随机扩散运动；an execution sub-module for executing movements, foraging activities, and random diffusion movements induced by krill swarms that remove the worst krill individuals and the best krill individuals;

更新子模块，用于对各磷虾群个体位置向量进行更新；The update submodule is used to update the individual position vector of each krill group;

判断子模块，用于判断是否达到磷虾群算法的最大迭代次数；The judgment sub-module is used to judge whether the maximum number of iterations of the krill swarm algorithm is reached;

返回子模块，用于当未达到磷虾群算法的最大迭代次数时，返回计算子模块执行根据磷虾群个体位置向量计算磷虾群个体的适应度值的步骤；The return sub-module is used for returning the calculation sub-module to perform the step of calculating the fitness value of the krill swarm individual according to the krill swarm individual position vector when the maximum number of iterations of the krill swarm algorithm is not reached;

第二确定子模块，用于当达到磷虾群算法的最大迭代次数时，将更新后的各磷虾群个体位置向量作为优化后的DBSCAN聚类算法的领域参数。The second determination submodule is used to use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm when the maximum number of iterations of the krill group algorithm is reached.

进一步的，该计算子模块可以包括：Further, the calculation submodule may include:

计算单元，用于根据公式Calculation unit, used for formulas

该更新子模块可以包括：The update submodule can include:

更新单元，用于根据公式更新各磷虾群个体位置向量；Update cell, used to according to the formula Update the individual position vector of each krill group;

其中，Δt为更新步长。Among them, Δt is the update step size.

该更新子模块还可以包括：The update submodule may also include:

交叉运算单元，用于根据公式对各磷虾群个体位置向量元素进行交叉运算；Crossover unit, used for formulas Perform a crossover operation on the individual position vector elements of each krill group;

变异运算单元，用于根据公式对各磷虾群个体位置向量元素进行变异运算；Mutation operation unit, used for formula Perform mutation operation on individual position vector elements of each krill group;

由于系统部分的实施例与方法部分的实施例相互对应，因此系统部分的实施例请参见方法部分的实施例的描述，这里暂不赘述。Since the embodiments of the system part correspond to the embodiments of the method part, for the embodiments of the system part, please refer to the description of the embodiments of the method part, which will not be repeated here.

请参考图5，图5为本申请实施例所提供的一种异常数据检测设备的结构图。Please refer to FIG. 5 , which is a structural diagram of an abnormal data detection device provided by an embodiment of the present application.

该异常数据检测设备500可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器(central processing units，CPU)522(例如，一个或一个以上处理器)和存储器532，一个或一个以上存储应用程序542或数据544的存储介质530(例如一个或一个以上海量存储设备)。其中，存储器532和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出)，每个模块可以包括对装置中的一系列指令操作。更进一步地，中央处理器522可以设置为与存储介质530通信，在异常数据检测设备500上执行存储介质530中的一系列指令操作。The abnormal data detection apparatus 500 may vary greatly due to different configurations or performances, and may include one or more central processing units (CPU) 522 (eg, one or more processors) and a memory 532, a or one or more storage media 530 (eg, one or more mass storage devices) storing applications 542 or data 544. Among them, the memory 532 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the apparatus. Furthermore, the central processing unit 522 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the abnormal data detection device 500 .

异常数据检测设备500还可以包括一个或一个以上电源525，一个或一个以上有线或无线网络接口550，一个或一个以上输入输出接口558，和/或，一个或一个以上操作系统541，例如Windows ServerTM，Mac OS XTM，UnixTM,LinuxTM，FreeBSDTM等等。Anomaly data detection device 500 may also include one or more power supplies 525, one or more wired or wireless network interfaces 550, one or more input and output interfaces 558, and/or, one or more operating systems 541, such as Windows Server™ , Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and so on.

上述图1至图2所描述的异常数据检测的方法中的步骤由异常数据检测设备基于该图5所示的结构实现。The steps in the abnormal data detection method described above in FIGS. 1 to 2 are implemented by the abnormal data detection device based on the structure shown in FIG. 5 .

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统，装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and module described above can refer to the corresponding process in the foregoing method embodiments, which is not repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置、设备和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个模块或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或模块的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatuses, devices and methods may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods, for example, multiple modules or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in electrical, mechanical or other forms.

作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理模块，即可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。Modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical modules, that is, they may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申请各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，功能调用装置，或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated modules, if implemented in the form of software functional modules and sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , which includes several instructions to cause a computer device (which may be a personal computer, a function invocation device, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

以上对本申请所提供的一种异常数据检测的方法、系统、设备及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以对本申请进行若干改进和修饰，这些改进和修饰也落入本申请权利要求的保护范围内。A method, system, device, and computer-readable storage medium for abnormal data detection provided by the present application have been described in detail above. Specific examples are used herein to illustrate the principles and implementations of the present application, and the descriptions of the above embodiments are only used to help understand the methods and core ideas of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present application, several improvements and modifications can also be made to the present application, and these improvements and modifications also fall within the protection scope of the claims of the present application.

还需要说明的是，在本说明书中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that, in this specification, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is no such actual relationship or sequence between operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.

Claims

1. a method for abnormal data detection, characterized in that, comprising:

Using krill swarm algorithm to optimize the domain parameters of DBSCAN clustering algorithm;

Get data to be processed;

Marking the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

It is determined that the data in the noise data set is abnormal data.

2. method according to claim 1, is characterized in that, described utilizing krill swarm algorithm to optimize the domain parameter of DBSCAN clustering algorithm, comprising:

Taking the field parameters of the DBSCAN clustering algorithm as the krill group individual position vector;

Calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

Determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

performing locomotion, foraging activity, and random dispersal locomotion induced by the krill colony excluding the worst krill individual and the optimal krill individual;

updating the individual position vectors of each of the krill groups;

judging whether the maximum number of iterations of the krill swarm algorithm is reached;

If not, returning to the step of calculating the fitness value of the krill group individual according to the position vector of the krill group individual;

If so, use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm.

3. The method according to claim 2, wherein calculating the fitness value of the krill group individual according to the position vector of the krill group individual comprises:

According to the formula

calculating the fitness value of the krill group individuals;

Among them, f(x) is the krill individual fitness function, D ₁ , D ₂ ,..., D _k is the k clusters after the DBSCAN density clustering with the krill individual current position vector as the domain parameter , x and x' are the position vectors of individuals in the krill group, dist(x, x') is the Euclidean distance between x and x', and ε is a constant.

4. The method according to claim 2, wherein updating the individual position vectors of each of the krill groups comprises:

According to the formula updating the individual position vectors of each of the krill groups;

Among them, Δt is the update step size.

5. The method according to claim 4, characterized in that, according to the formula Before updating the individual position vector of each of the krill groups, it also includes:

According to the formula performing a crossover operation on the individual position vector elements of each of the krill groups;

According to the formula performing mutation operation on the individual position vector elements of each of the krill groups;

Among them, x _i,m is the m-th group of position vector elements in the i-th krill individual, Cr is the threshold of the crossover probability, and R _i,m is the m-th group of position vector elements in the i-th krill individual for crossover operation Or the probability of mutation operation, Mu is the mutation probability, x _gbes,m is the optimal position vector element for the current iteration, x _p,m , x _q,m are randomly selected position vector elements, and μ is a constant.

6. A system for abnormal data detection, comprising:

The optimization module is used to optimize the domain parameters of the DBSCAN clustering algorithm using the krill swarm algorithm;

The acquisition module is used to acquire the data to be processed;

a labeling module for labeling the data to be processed based on the optimized DBSCAN clustering algorithm to obtain a cluster data set and a noise data set;

A determination module, configured to determine that the data in the noise data set is abnormal data.

7. The system according to claim 6, wherein the optimization module comprises:

a position vector determination submodule, used for taking the field parameter of the DBSCAN clustering algorithm as the krill group individual position vector;

a calculation submodule, configured to calculate the fitness value of the krill group individual according to the position vector of the krill group individual;

a first determination submodule, configured to determine the worst krill individual and the optimal krill individual according to the fitness value of the krill group individual;

an execution sub-module for executing movement, foraging activity and random diffusion movement induced by the krill swarm excluding the worst krill individual and the optimal krill individual;

an update submodule for updating the individual position vectors of the krill groups;

a judging submodule for judging whether the maximum number of iterations of the krill swarm algorithm is reached;

A return sub-module is used for returning to the calculation sub-module to perform the step of calculating the fitness value of the krill group individual according to the position vector of the krill group individual when the maximum number of iterations of the krill group algorithm has not been reached ;

The second determination submodule is configured to use the updated individual position vector of each krill group as the domain parameter of the optimized DBSCAN clustering algorithm when the maximum number of iterations of the krill group algorithm is reached.

8. The system according to claim 7, wherein the calculation submodule comprises:

Calculation unit, used for formulas

calculating the fitness value of the krill group individuals;

9. A device for detecting abnormal data, comprising:

memory for storing computer programs;

The processor is configured to implement the steps of the abnormal data detection method according to any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the abnormal data according to any one of claims 1 to 5 is realized The steps of the method of detection.