CN111814910B - Abnormality detection method, abnormality detection device, electronic device, and storage medium - Google Patents

Abnormality detection method, abnormality detection device, electronic device, and storage medium Download PDF

Info

Publication number
CN111814910B
CN111814910B CN202010809725.0A CN202010809725A CN111814910B CN 111814910 B CN111814910 B CN 111814910B CN 202010809725 A CN202010809725 A CN 202010809725A CN 111814910 B CN111814910 B CN 111814910B
Authority
CN
China
Prior art keywords
target
transaction data
application transaction
distance
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010809725.0A
Other languages
Chinese (zh)
Other versions
CN111814910A (en
Inventor
李耕寅
吴声
常杰
熊慧君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010809725.0A priority Critical patent/CN111814910B/en
Publication of CN111814910A publication Critical patent/CN111814910A/en
Application granted granted Critical
Publication of CN111814910B publication Critical patent/CN111814910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the disclosure provides an anomaly detection method, an anomaly detection device, electronic equipment and a storage medium, which can be applied to the fields of artificial intelligence, big data and information security. The method comprises the following steps: acquiring application transaction data to be detected; and processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomaly data or not, and the anomaly detection model is generated based on combined training of a clustering algorithm and a probability density estimation algorithm.

Description

Abnormality detection method, abnormality detection device, electronic device, and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of computers, and more particularly relates to an abnormality detection method, an abnormality detection device, electronic equipment and a storage medium.
Background
In an application transaction, a large amount of application transaction data may be generated. Abnormal data may occur in a large amount of application transaction data, wherein the abnormal data refers to data inconsistent with other application transaction data. Since the abnormal data has the characteristics of small quantity and outliers, the abnormal data is the basis for finding faults, and therefore, the detection of the abnormal data has important production and practical significance.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art: the accuracy of detecting abnormal data by adopting the related technology is low.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide an anomaly detection method, an anomaly detection device, an electronic device, and a storage medium.
An aspect of an embodiment of the present disclosure provides an anomaly detection method, including: acquiring application transaction data to be detected; and processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomaly data or not, and the anomaly detection model is generated based on combined training of a clustering algorithm and a probability density estimation algorithm.
According to an embodiment of the present disclosure, the anomaly detection model is generated based on a joint training of a clustering algorithm and a probability density estimation algorithm, and includes: acquiring a history detection set, wherein the history detection set comprises a plurality of historical application transaction data; processing historical application transaction data in the historical detection set by using the clustering algorithm to obtain target classifications to which each historical application transaction data belongs and target clustering centers of each target classification; determining target distances between each historical application transaction data and a target cluster center of a target class to which the historical application transaction data belongs, and obtaining at least one target distance in each target class; calculating a distance threshold value of each target class according to at least one target distance in each target class and the probability density estimation algorithm; and generating an anomaly detection model according to the distance threshold value of each target clustering center and each target classification.
According to an embodiment of the present disclosure, the processing, by using the clustering algorithm, historical application transaction data in the historical detection set to obtain target classifications to which the historical application transaction data belong and target clustering centers of each of the target classifications includes: determining an initial cluster center of each target class; determining an initial distance between each of the historical application transaction data and each of the initial cluster centers; determining a target class to which each of the historical application transaction data belongs according to the initial distance; determining a distance average value of the initial distances in each target class, and taking the distance average value as a new initial clustering center of the target class; repeating the operation of determining the initial distance and determining a new initial cluster center of the target classification until a preset condition is met; and taking the new initial cluster center of each target class obtained when the preset condition is met as the target cluster center of the corresponding target class.
According to an embodiment of the present disclosure, the calculating, according to the at least one target distance in each of the target classifications and the probability density estimation algorithm, a distance threshold value of each of the target classifications includes: determining a target distance between each historical application transaction data and a target clustering center corresponding to the historical application transaction data; determining a distance mean and a distance variance for each of the target distances in each of the target classifications; and processing the distance mean and the distance variance of each target class by using a Gaussian distribution algorithm to obtain a distance threshold of each target class.
According to an embodiment of the present disclosure, the processing the application transaction data to be detected using the anomaly detection model to obtain a detection result includes: determining the space distance between the application transaction data to be detected and each target clustering center; determining target classification to which the application transaction data to be detected belongs according to the spatial distances; and obtaining a detection result according to the space distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold of the target classification.
According to an embodiment of the present disclosure, the obtaining a detection result according to a spatial distance between the application transaction data to be detected and the target cluster center of the target classification and a distance threshold of the target classification includes: if the space distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold value of the target classification, the detection result is that the application transaction data to be detected is abnormal data; and if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold of the target classification, the detection result is that the application transaction data to be detected is normal data.
According to an embodiment of the present disclosure, the application transaction data to be detected includes a plurality of dimensions;
if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detection result is that the application transaction data to be detected is abnormal data, and then the method further includes:
determining the degree of abnormality of each dimension in the application transaction data to be detected;
and determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which leads the application transaction data to be detected to be abnormal data.
Another aspect of the disclosed embodiments provides an abnormality detection apparatus including: the acquisition module is used for acquiring application transaction data to be detected; and a processing module, configured to process the application transaction data to be detected by using an anomaly detection model, to obtain a detection result, where the detection result is used to characterize whether the application transaction data is anomaly data, and the anomaly detection model is generated based on joint training of a clustering algorithm and a probability density estimation algorithm.
Another aspect of an embodiment of the present disclosure provides an electronic device including: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Another aspect of the disclosed embodiments provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement a method as described above.
Another aspect of the disclosed embodiments provides a computer program comprising computer executable instructions which, when executed, are adapted to carry out the method as described above.
According to the embodiment of the disclosure, application transaction data to be detected is processed by using an anomaly detection model generated based on combined training of a clustering algorithm and a probability density estimation algorithm, so as to obtain a detection result used for representing whether the application transaction data is anomaly data. The abnormal data is detected by adopting the abnormal detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm, so that the more accurate identification and positioning of the abnormal data are realized, the problem of lower accuracy of abnormal data detection in the related technology is at least partially overcome, and the accuracy of abnormal data detection is further improved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture for anomaly detection methods to which the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of an anomaly detection method according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of another anomaly detection method in accordance with an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of an anomaly detection device, according to an embodiment of the present disclosure; and
fig. 5 schematically illustrates a block diagram of an electronic device adapted to implement an anomaly detection method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the process of realizing the disclosed concept, the inventor finds that the problem of low accuracy of abnormal data detection can be solved by adopting unsupervised learning. Wherein, the unsupervised learning is a machine learning with no label of training data. The embodiment of the disclosure provides a technical scheme for carrying out anomaly detection based on a combination of a clustering algorithm and a probability density estimation algorithm.
A clustering algorithm may be used for cluster analysis. The cluster analysis is an unsupervised machine learning algorithm, and belongs to a exploratory data analysis method. The cluster analysis is to divide similar objects into one object class according to the distance or similarity between the objects to form a plurality of object classes. Target classification refers to a collection of similar objects. The clustering result requires higher object similarity with the same object classification and lower object similarity with different object classifications. The clustering algorithm may include a K-means clustering algorithm, a K-center clustering algorithm, a CLARA (Clustering LARge Application) algorithm, or a fuzzy C-means algorithm. A probability density estimation algorithm may be used to determine the abnormal boundary threshold. Wherein the anomaly boundary threshold value may be used to determine whether the data is a criterion for anomalous data.
For abnormal data detection in the application transaction full link, the technical scheme provided by the embodiment of the disclosure can also be adopted. Applying a transaction full link may refer to various types of transactions. The transactions may include internet transactions, banking transactions, and the like. Accordingly, the above-described objects may refer to application transaction data. Application transaction data may refer to data generated during a transaction. In the application transaction full link, the target classification may refer to an operation mode, which may include a transaction generated on a weekday, a transaction generated on a double holiday, a transaction generated in the morning, a transaction generated in the afternoon, and the like. It should be noted that, the embodiments of the present disclosure are mainly directed to abnormal data detection of an application transaction full link, and will be described below with reference to specific embodiments.
The embodiment of the disclosure provides an abnormality detection method, an abnormality detection device and electronic equipment capable of applying the abnormality detection method. The anomaly detection method, the anomaly detection device and the electronic equipment can be used in the fields of artificial intelligence, big data and information security. The method includes a detection process and a training process. In the detection process, application transaction data to be detected are obtained, the application transaction data to be detected are processed by using an anomaly detection model generated based on the combined training of a clustering algorithm and a probability density estimation algorithm, and a detection result is obtained, wherein the detection result is used for representing whether the application transaction data are anomaly data or not. In the training process, a history detection set is obtained, wherein the history detection set comprises a plurality of history application transaction data. And processing historical application transaction data in the historical detection set by using a clustering algorithm to obtain target classification of each historical detection point and a target clustering center of each target classification. Determining target distances between each historical application transaction data and target cluster centers of target classifications to which the historical detection points belong, obtaining at least one target distance in each target classification, calculating a distance threshold of each target classification according to at least one target distance and a probability density estimation algorithm in each target classification, and generating an anomaly detection model according to the distance thresholds of each target cluster center and each target classification.
Fig. 1 schematically illustrates an exemplary system architecture 100 in which anomaly detection methods may be applied according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as banking class applications, shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, to name a few.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the anomaly detection method provided in the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the abnormality detection apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The anomaly detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the abnormality detection apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 2 schematically illustrates a flowchart of an anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S220.
In operation S210, application transaction data to be detected is acquired.
In embodiments of the present disclosure, application transaction data to be detected may refer to data generated during a transaction. The application transaction data to be detected may comprise a plurality of dimensions, i.e. the application transaction data to be detected may be multi-dimensional data.
In the anomaly detection of the application transaction full link, the dimension of the application transaction data to be detected may include at least one of an online transaction rate, a banking transaction rate, an online transaction response time, a banking transaction response time, an online transaction success rate, and a banking transaction success rate. In addition, to achieve better anomaly detection, the dimension of the application transaction data to be detected may also be a time dimension, which may include months, years, weeks, and time periods, where a time period may include an early time period and an late time period. Exemplary, as early as period 06:00-18:00, evening hours 18:00-06:00.
The application transaction data to be detected can be application transaction data to be detected, which is sent by other electronic equipment, and can also be application transaction data to be detected, which is locally stored by the electronic equipment. The electronic device in the embodiment of the disclosure may refer to a server, and other electronic devices may refer to terminals. Illustratively, if a user purchases a commodity on a platform using a terminal, a bank card is selected for payment. After the payment is completed, the terminal generates application transaction data and sends the application transaction data to the server.
In operation S220, application transaction data to be detected is processed by using an anomaly detection model to obtain a detection result, wherein the detection result is used for characterizing whether the application transaction data is anomaly data, and the anomaly detection model is generated based on joint training of a clustering algorithm and a probability density estimation algorithm.
In an embodiment of the present disclosure, in order to determine whether the application transaction data to be detected is abnormal data, a manner of processing the application transaction data to be detected may be adopted based on an abnormal detection model generated by joint training of a clustering algorithm and a probability density estimation algorithm. The clustering algorithm may include a K-means clustering algorithm, a K-center clustering algorithm, a CLARA algorithm, or a fuzzy C-means algorithm. The probability density estimation algorithm may include a gaussian distribution algorithm.
And inputting the application transaction data to be detected into an anomaly detection model, and outputting a detection result for representing whether the application transaction data is anomaly data. The specific form of the detection result may be set according to the actual situation, and is not particularly limited herein.
It should be noted that the anomaly detection model may be a model that is generated by training historical application transaction data using a clustering algorithm and a probability density estimation algorithm. Because the clustering result of the clustering algorithm is to determine the target clustering center of each target class, and the probability density estimation algorithm can determine the distance threshold corresponding to the target class according to the clustering result, the anomaly detection model can determine the target class to which the application transaction data belongs, and determine whether the application transaction data is the anomaly data according to the distance between the application transaction data and the target clustering center of the target class to which the application transaction data belongs and the distance threshold.
It should be further noted that, the application transaction data to be detected is processed by adopting the anomaly detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm, and the obtained detection result for representing whether the application transaction data to be detected is the anomaly data is more accurate, which can effectively identify the application transaction data which is easier to be determined as the anomaly data in the related technology, wherein the anomaly data is easier to be determined as dense outlier data in a small amount in a service peak period or a service valley period.
According to the technical scheme of the embodiment of the disclosure, application transaction data to be detected is processed by using an anomaly detection model generated based on the combined training of a clustering algorithm and a probability density estimation algorithm, so as to obtain a detection result for representing whether the application transaction data is anomaly data. The abnormal data is detected by adopting the abnormal detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm, so that the more accurate identification and positioning of the abnormal data are realized, the problem of lower accuracy of abnormal data detection in the related technology is at least partially overcome, and the accuracy of abnormal data detection is further improved.
Optionally, on the basis of the above technical solution, the anomaly detection model may be generated based on a clustering algorithm and a probability density estimation algorithm combined training, and may include: a history detection set is obtained, wherein the history detection set includes a plurality of historical application transaction data. And processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification of each historical application transaction data and the target clustering center of each target classification. And determining target distances between each historical application transaction data and a target cluster center of the target class to which the historical application transaction data belongs, and obtaining at least one target distance in each target class. And calculating a distance threshold value of each target class according to at least one target distance and probability density estimation algorithm in each target class. And generating an anomaly detection model according to the distance threshold value of each target cluster center and each target class.
In an embodiment of the present disclosure, to obtain the anomaly detection model, a clustering algorithm and a probability density estimation algorithm may be employed to train the historical test set. Wherein the historical verification set may include a plurality of historical application transaction data. Historical application transaction data may refer to data generated during a historical transaction. The historical application transaction data may include historical application transaction data and/or historical application transaction data. The historical application transaction data may include a plurality of dimensions, i.e., the historical application transaction data may be multi-dimensional data.
In the anomaly detection of the application transaction full link, the dimension of the historical application transaction data may include at least one of an online transaction rate, a banking transaction rate, an online transaction response time, a banking transaction response time, an online transaction success rate, and a banking transaction success rate. In addition, to achieve better anomaly detection, the dimension of historical application trade data may also be a time dimension, which may include month, year, number of weeks, and time period.
It should be noted that, the historical application transaction data may be the historical application transaction data that the electronic device receives from other electronic devices, or may be the historical application transaction data that is locally stored in the electronic device. The electronic device in the embodiment of the disclosure may refer to a server, and other electronic devices may refer to terminals.
After the historical application transaction data is obtained, a clustering algorithm can be adopted to process each historical application transaction data in the historical detection set so as to obtain the target classification of each historical application transaction data and the target clustering center of each target classification. The clustering algorithm may be a K-means clustering algorithm. The target classification may refer to an operation mode, which may include transactions generated on weekdays, transactions generated on holidays, transactions generated in the morning, transactions generated in the afternoon, and the like.
After the target classification to which each historical application transaction data belongs is obtained, and the target cluster center corresponding to each target classification, the target distance between the historical application transaction data and the target cluster center of the target classification to which the historical application transaction data belongs can be determined for each historical application transaction data. Based on this, the target distance for each historical application transaction data may be obtained. Accordingly, for each target class, a respective target distance corresponding to the target class may be obtained.
After obtaining the respective target distances for each target class, a distance threshold for each target class may be determined for that target class based on the probability density estimation algorithm and the respective target distances for that target class. Based on this, a distance threshold for each target class can be obtained.
After the distance threshold value of each target class and each target cluster center are obtained, an anomaly detection model may be generated according to the distance threshold value of each target cluster center and each target class. According to an embodiment of the present disclosure, the anomaly detection model is implemented by clustering and threshold detection to determine whether application transaction data to be detected is anomaly data. The distance threshold may be used as a criterion for determining whether the application transaction data to be detected is abnormal data.
According to the embodiment of the present disclosure, since the distance threshold, which is a criterion for determining whether application transaction data to be detected is abnormal data, is determined based on the target distance corresponding to historical application transaction data under the target classification, not based on expert experience, the distance threshold is more realistic. Because the distance threshold value is more in line with the actual situation, the accuracy of determining whether the application transaction data to be detected is abnormal data or not based on the distance threshold value is improved, namely the accuracy of detecting the abnormal data is improved.
It should be noted that, the training interval time of the anomaly detection model and the number of the historical application transaction data included in the historical detection set can be flexibly adjusted according to the requirements of the user. The anomaly detection model can cluster the frequently occurring history situations into a history normal mode.
Optionally, on the basis of the above technical solution, processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification to which each historical application transaction data belongs and the target clustering center of each target classification may include: an initial cluster center for each target class is determined. An initial distance between each historical application transaction data and each initial cluster center is determined. And determining the target classification to which each historical application transaction data belongs according to the initial distance. And determining a distance average value of each initial distance in each target class, and taking the distance average value as a new initial clustering center of the target class. And repeatedly executing the operation of determining the initial distance and determining the new initial clustering center of the target classification until the preset condition is met. And taking the new initial cluster center of each target class obtained when the preset condition is met as the target cluster center of the corresponding target class.
In embodiments of the present disclosure, in order to determine the target class to which the application transaction data to be detected belongs, and the target cluster center of each target class, a clustering algorithm may be employed.
In the data space T, the history detection set X may include M pieces of history application transaction data, wherein the history detection set x= (X) 1 ,x 2 ,...,x i ,...,x M-1 ,x M ). Historical application transaction data x i =(x i1 ,x i2 ,...,x ij ,...,x iN-1 ,x iN ) Where i=1, 2,..m-1, M, j=1, 2,..n-1, N, j represents the dimension of the historical application transaction data. The history set X corresponds to an mxn matrix.
The clustering algorithm aims at dividing the history detection set X into K target classifications, and determining a corresponding target clustering center for each target classification. The basis for the partitioning may be the similarity between historical application transaction data. Wherein, the index for representing the similarity can comprise a similarity coefficient or a distance index, and the distance index can comprise Euclidean distance, square of Euclidean distance, manhattan distance, chebyshev distance or chi-square distance, and the like. The smaller the distance between different historical application transaction data, the higher the similarity between the different historical application transaction data may be explained. The greater the correlation coefficient between different historical application transaction data, the higher the similarity between the different historical application transaction data may also be. The clustering algorithm may include a K-means clustering algorithm. The K-means clustering algorithm is an iterative clustering algorithm that uses a distance index as an index of similarity. According to the technical scheme, a history detection set is processed by adopting a K-means clustering algorithm, and a target clustering center of each target classification is obtained. The following specifically describes a process of processing the history detection set by using a K-means clustering algorithm to obtain a target cluster center of each target class.
K target classifications are preset. For each target class, randomly selecting one historical application transaction data from the historical detection set as an initial clustering center of the target class. Based on this, K initial cluster centers can be obtained. It should be noted that the specific value of K may be determined according to the number of target classifications to which the historical detection set may correspond. The specific value of K may also be determined in an adjustable manner.
After determining the initial cluster centers of the K target classifications, transaction data can be applied to each history, and initial distances between the history application transaction data and each initial cluster center are determined, so that K initial distances can be obtained. I.e. each historical application transaction data corresponds to K initial distances. The target classification to which each historical application transaction data belongs is determined based on the principle that the distance between the historical application transaction data and the initial clustering center is the smallest.
After obtaining the target classification to which each historical application transaction data belongs, for each target classification, determining a distance average value of each initial distance according to the initial distance of each historical application transaction data belonging to the target classification, and taking the distance average value as a new initial clustering center of the target classification. Based on this, K new initial cluster centers can be obtained.
Repeatedly executing the operation of determining the initial distance between each historical application transaction data and each initial clustering center, determining the target classification to which each historical application transaction data belongs according to the initial distance, determining the distance average value of each initial distance in each target classification, taking the distance average value as the new initial clustering center of the target classification, and ending the repeated execution operation until the preset condition is met. The preset condition may be that each new initial cluster center changes within a preset range before and after each iteration, or may be that the preset iteration number is reached.
When the preset conditions are met, acquiring current K new initial cluster centers, and taking each current new initial cluster center as a target cluster center of the corresponding target classification.
Optionally, on the basis of the above technical solution, calculating a distance threshold value of each target class according to at least one target distance and a probability density estimation algorithm in each target class may include: and determining the target distance between each historical application transaction data and the target clustering center corresponding to the historical application transaction data. A distance mean and a distance variance of the respective target distances in each target class are determined. And processing the distance mean value and the distance variance of each target class by using a Gaussian distribution algorithm to obtain the distance threshold value of each target class.
In embodiments of the present disclosure, to obtain a distance threshold for each target class, a gaussian distribution algorithm may be employed.
And determining the target distance between the historical application transaction data and a target cluster center corresponding to the historical application transaction data according to each historical application transaction data. Based on this, the target distance for each historical application transaction data may be obtained.
For each target class, determining a distance mean and a distance variance from the target class according to the respective target distances under the target class. And processing the distance mean value and the distance variance of the target classification by adopting a Gaussian distribution algorithm to obtain a distance threshold value of the target classification. The distance mean value and the distance variance of the target classification can be used as parameters of a Gaussian distribution algorithm, and the distance threshold value of the target classification can be obtained through calculation.
Optionally, on the basis of the above technical solution, processing application transaction data to be detected by using an anomaly detection model to obtain a detection result may include: and determining the spatial distance between the application transaction data to be detected and each target clustering center. And determining the target classification to which the application transaction data to be detected belongs according to each spatial distance. And obtaining a detection result according to the space distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
In an embodiment of the present disclosure, the anomaly detection model is generated from a target cluster center of each target class and a distance threshold of each target class. After the application transaction data to be detected is obtained, the spatial distance between the application transaction data to be detected and the target clustering center of each target classification in the anomaly detection model can be determined. Based on this, a spatial distance corresponding to the target cluster center of each target class can be obtained. And determining the minimum space distance from the space distances, and attributing the application transaction data to be detected to the target classification to which the target cluster center with the minimum space distance with the application transaction data to be detected belongs.
After the target classification to which the application transaction data to be detected belongs is obtained, the spatial distance between the application transaction data to be detected and the target clustering center of the target classification can be compared with the distance threshold of the target classification, and according to the comparison result, a detection result for representing whether the application transaction data is abnormal data or not is determined.
Optionally, on the basis of the above technical solution, obtaining the detection result according to the spatial distance between the application transaction data to be detected and the target cluster center of the target classification and the distance threshold of the target classification may include: if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detection result is that the application transaction data to be detected is abnormal data. If the space distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold value of the target classification, the detection result is that the application transaction data to be detected is normal data.
In an embodiment of the present disclosure, after obtaining a spatial distance between application transaction data to be detected and a target cluster center of a target class and a distance threshold of the target class, the spatial distance is compared with the distance threshold.
If the spatial distance is greater than or equal to the distance threshold, it may be indicated that the application transaction data to be detected is anomalous. If the spatial distance is less than the distance threshold, the application transaction data to be detected may be interpreted as normal data.
Optionally, on the basis of the above technical solution, the application transaction data to be detected includes a plurality of dimensions. If the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, after the detection result is that the application transaction data to be detected is abnormal data, the method further comprises the following steps: and determining the degree of abnormality of each dimension in the application transaction data to be detected. And determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which leads the application transaction data to be detected to be abnormal data.
In embodiments of the present disclosure, since the application transaction data to be detected may include a plurality of dimensions, a main cause that causes the application transaction data to be detected to be abnormal data may be one of the dimensions. In order to determine the dimensions that result in the application transaction data to be detected being anomalous data, a manner of determining the degree of anomaly for each dimension may be employed.
For application transaction data to be detected, which is determined to be abnormal data, determining the degree of abnormality of each dimension in the application transaction data to be detected, wherein the degree of abnormality can be used for representing the degree of deviation of the dimension. And determining the dimension with the highest degree of abnormality as a target dimension which leads to the application transaction data to be detected as abnormal data.
Optionally, determining the degree of abnormality of each dimension in the application transaction data to be detected may include: mapping application transaction data to be detected into a preset coordinate system to obtain projection lengths corresponding to each dimension. The projection length corresponding to each dimension is taken as the degree of abnormality of each dimension. Accordingly, determining the dimension with the highest degree of abnormality as the target dimension which leads to the application transaction data to be detected as the abnormal data may include: and determining the dimension with the largest projection length as a target dimension, wherein the target dimension is the dimension which leads to the application transaction data to be detected as abnormal data.
According to the method, the abnormal degree of each dimension in the application transaction data to be detected is determined, and the abnormal dimension used for representing the application transaction data to be detected as the abnormal data is determined according to the abnormal degree, so that the reason for causing the data abnormality is accurately positioned.
Optionally, based on the above technical solution, acquiring application transaction data to be detected may include: raw application transaction data is obtained. And preprocessing the original application transaction data to obtain application transaction data to be detected.
In embodiments of the present disclosure, preprocessing may include at least one of data cleansing, data integration, data reduction, and data transformation. Wherein the data transformation may comprise a normalization process. The normalization process may achieve a consistent weight for each dimension in the original application transaction data. The method comprises the steps of converting data of each dimension in original application transaction data into standard data with a mean value of 0 and a variance of 1 by adopting a standardized algorithm so as to obtain application transaction data to be detected.
It should be noted that, each historical application transaction data in the historical detection set may be the historical application transaction data after preprocessing. The data may be preprocessed in the same manner as the original application transaction data to obtain the historical detection set.
Fig. 3 schematically illustrates a flowchart of another anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 3, the method includes operations S301 to S320.
In operation S301, a history detection set is acquired, wherein the history detection set includes a plurality of history application transaction data.
In operation S302, an initial cluster center for each target class is determined.
In operation S303, an initial distance between each historical application transaction data and each initial cluster center is determined.
In operation S304, a target class to which each of the historical application transaction data belongs is determined according to the initial distance.
In operation S305, a distance average of the respective initial distances in each target class is determined, and the distance average is taken as a new initial cluster center of the target class.
In operation S306, it is determined whether a preset condition is satisfied; if yes, return to execute operation S303; if not, operation S307 is performed.
In operation S307, a new initial cluster center of each target class obtained when the preset condition is satisfied is taken as a target cluster center of the corresponding target class.
In operation S308, a target distance between each of the historical application transaction data and a target cluster center of the target class to which the historical application transaction data belongs is determined, and at least one target distance in each of the target classes is obtained.
In operation S309, a target distance of each of the historical application transaction data and a target cluster center corresponding to the historical application transaction data is determined.
In operation S310, a distance mean and a distance variance of the respective target distances in each target class are determined.
In operation S311, the distance mean and the distance variance of each target class are processed using a gaussian distribution algorithm to obtain a distance threshold for each target class.
In operation S312, an anomaly detection model is generated based on the distance threshold for each target cluster center and each target class.
In operation S313, application transaction data to be detected is acquired, wherein the application transaction data to be detected includes a plurality of dimensions.
In operation S314, a spatial distance between the application transaction data to be detected and each target cluster center is determined.
In operation S315, a target class to which the application transaction data to be detected belongs is determined according to each spatial distance.
In operation S316, whether the spatial distance between the application transaction data to be detected and the target cluster center of the target class is greater than or equal to a distance threshold of the target class; if yes, executing operation S317; if not, operation S318 is performed.
In operation S317, the detection result is that the application transaction data to be detected is abnormal data, and operation S319 is performed.
In operation S318, the detection result is that the application transaction data to be detected is normal data.
In operation S319, the degree of abnormality of each dimension in the application transaction data to be detected is determined.
In operation S320, an anomaly dimension is determined according to each anomaly degree, wherein the anomaly dimension is used for characterizing a dimension that results in application transaction data to be detected as anomaly data.
In the embodiment of the disclosure, by adopting the technical scheme provided by the embodiment of the disclosure, the hit rate of abnormal data detection can reach more than 90%, the accuracy rate can reach more than 95%, and the method plays a role in higher auxiliary positioning and auxiliary decision making for production emergency.
According to the technical scheme of the embodiment of the disclosure, the anomaly detection model is a model generated by training historical application transaction data by adopting a clustering algorithm and a probability density estimation algorithm, wherein the clustering result of the clustering algorithm is a target clustering center for determining each target classification, and the probability density estimation algorithm is a distance threshold corresponding to the target classification determined according to the clustering result, so that the anomaly detection model can determine the target classification to which the application transaction data belongs and determine whether the application transaction data is the anomaly data according to the distance between the application transaction data and the target clustering center of the target classification to which the application transaction data belongs and the distance threshold. Since the distance threshold, which is a criterion for determining whether the application transaction data to be detected is abnormal data, is determined based on the target distance corresponding to the historical application transaction data under the target classification, not based on expert experience, the distance threshold is more realistic. Because the distance threshold value is more in line with the actual situation, the accuracy of determining whether the application transaction data to be detected is abnormal data or not based on the distance threshold value is improved, namely the accuracy of detecting the abnormal data is improved. In addition, by determining the degree of abnormality of each dimension in the application transaction data to be detected, and determining the abnormal dimension used for representing the application transaction data to be detected as abnormal data according to each degree of abnormality, more accurate positioning of the cause of the data abnormality is further realized.
Fig. 4 schematically illustrates a block diagram of an anomaly detection device according to an embodiment of the present disclosure.
As shown in fig. 4, the anomaly detection apparatus 400 may include an acquisition module 410 and a processing module 420.
The acquisition module 410 is communicatively coupled to the processing module 420.
An acquisition module 410 is configured to acquire application transaction data to be detected.
The processing module 420 is configured to process application transaction data to be detected by using an anomaly detection model to obtain a detection result, where the detection result is used to characterize whether the application transaction data is anomaly data, and the anomaly detection model is generated based on joint training of a clustering algorithm and a probability density estimation algorithm.
According to the technical scheme of the embodiment of the disclosure, application transaction data to be detected is processed by using an anomaly detection model generated based on the combined training of a clustering algorithm and a probability density estimation algorithm, so as to obtain a detection result for representing whether the application transaction data is anomaly data. The abnormal data is detected by adopting the abnormal detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm, so that the more accurate identification and positioning of the abnormal data are realized, the problem of lower accuracy of abnormal data detection in the related technology is at least partially overcome, and the accuracy of abnormal data detection is further improved.
Alternatively, based on the above technical solution, the processing module 420 may include a first acquisition sub-module, a first determination sub-module, a calculation sub-module, and a generation sub-module.
The first acquisition sub-module is used for acquiring a history detection set, wherein the history detection set comprises a plurality of history application transaction data.
The first obtaining sub-module is used for processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification of each historical application transaction data and the target clustering center of each target classification.
The first determining sub-module is used for determining target distances between each historical application transaction data and a target cluster center of a target class to which the historical application transaction data belongs, and obtaining at least one target distance in each target class.
And the calculating sub-module is used for calculating a distance threshold value of each target class according to at least one target distance and probability density estimation algorithm in each target class.
And the generation sub-module is used for generating an abnormality detection model according to the distance threshold value of each target clustering center and each target classification.
Alternatively, on the basis of the above technical solution, the first obtaining submodule may include a first determining unit, a second determining unit, a third determining unit, a fourth determining unit, a repeating executing unit, and a fifth determining unit.
And the first determining unit is used for determining an initial cluster center of each target classification.
And the second determining unit is used for determining the initial distance between each historical application transaction data and each initial clustering center.
And the third determining unit is used for determining the target classification to which each historical application transaction data belongs according to the initial distance.
And a fourth determining unit, configured to determine a distance average value of the respective initial distances in each target class, and take the distance average value as a new initial cluster center of the target class.
And the repeated execution unit is used for repeatedly executing the operation of determining the initial distance and determining the new initial clustering center of the target classification until the preset condition is met.
And a fifth determining unit, configured to take a new initial cluster center of each target class obtained when the preset condition is satisfied as a target cluster center of the corresponding target class.
Alternatively, on the basis of the above technical solution, the calculation sub-module may include a sixth determining unit, a seventh determining unit, and an acquiring unit.
And a sixth determining unit, configured to determine a target distance between each historical application transaction data and a target cluster center corresponding to the historical application transaction data.
And a seventh determining unit for determining a distance mean and a distance variance of the respective target distances in each target class.
The obtaining unit is used for processing the distance mean value and the distance variance of each target class by using a Gaussian distribution algorithm to obtain the distance threshold value of each target class.
Alternatively, based on the above technical solution, the processing module 420 may include a second determining sub-module, a third determining sub-module, and a second obtaining sub-module.
And the second determination submodule is used for determining the space distance between the application transaction data to be detected and each target clustering center.
And the third determination submodule is used for determining the target classification to which the application transaction data to be detected belong according to each spatial distance.
And the second obtaining submodule is used for obtaining a detection result according to the space distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
Alternatively, on the basis of the above technical solution, the second obtaining submodule may include an eighth determining unit and a ninth determining unit.
And an eighth determining unit, configured to, if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, determine that the application transaction data to be detected is abnormal data.
And the ninth determining unit is used for determining that the application transaction data to be detected is normal data if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold value of the target classification.
Optionally, on the basis of the above technical solution, the application transaction data to be detected includes a plurality of dimensions.
The abnormality detection apparatus 400 may further include a first determination module and a second determination module.
And the first determining module is used for determining the degree of abnormality of each dimension in the application transaction data to be detected.
And the second determining module is used for determining abnormal dimensions according to the abnormal degrees, wherein the abnormal dimensions are used for representing dimensions which lead to the application transaction data to be detected to be abnormal data.
Alternatively, the acquiring module 410 may include a second acquiring sub-module and a processing sub-module based on the above technical solution.
And the second acquisition sub-module is used for acquiring the original application transaction data.
And the processing sub-module is used for preprocessing the original application transaction data to obtain application transaction data to be detected.
Any number of the modules, sub-modules, units, or at least some of the functionality of any number of the modules, sub-modules, units, may be implemented in one module in accordance with embodiments of the present disclosure. Any one or more of the modules, sub-modules, units according to embodiments of the present disclosure may be implemented as a split into multiple modules. Any one or more of the modules, sub-modules, units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a field programmable gate array (Field Programmable Gate Array, FPGA), a programmable logic array (Programmable Logic Arrays, PLA), a system on a chip, a system on a substrate, a system on a package, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or a suitable combination of any of the three. Alternatively, one or more of the modules, sub-modules, units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.
For example, any number of the acquisition module 410 and the processing module 420 may be combined in one module/unit to be implemented, or any one of the modules/units may be split into a plurality of modules/units. Alternatively, at least some of the functionality of one or more of the modules/units may be combined with at least some of the functionality of other modules/units and implemented in one module/unit. According to embodiments of the present disclosure, at least one of acquisition module 410 and processing module 420 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), programmable Logic Array (PLA), system-on-chip, system-on-a-substrate, system-on-package, application Specific Integrated Circuit (ASIC), or in hardware or firmware, in any other reasonable manner of integrating or packaging circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the acquisition module 410 and the processing module 420 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
It should be noted that, in the embodiment of the present disclosure, the portion of the abnormality detection device corresponds to the portion of the abnormality detection method executed by the electronic device in the embodiment of the present disclosure, and the description of the portion of the abnormality detection device specifically refers to the portion of the abnormality detection method, which is not described herein again.
Fig. 5 schematically shows a block diagram of an electronic device adapted to implement the method described above, according to an embodiment of the disclosure. The electronic device shown in fig. 5 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present disclosure includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 502 or a program loaded from a storage section 508 into a random access Memory (Random Access Memory, RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 500 may also include an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 501. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: portable computer diskette, hard disk, random Access Memory (RAM), read-Only Memory (ROM), erasable programmable read-Only Memory (EPROM (Erasable Programmable Read Only Memory) or flash Memory), portable compact disc read-Only Memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (8)

1. An anomaly detection method, comprising:
acquiring application transaction data to be detected; and
processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomaly data or not, and the anomaly detection model is generated based on combined training of a clustering algorithm and a probability density estimation algorithm;
the anomaly detection model is generated based on the combined training of a clustering algorithm and a probability density estimation algorithm, and comprises the following steps:
acquiring a history detection set, wherein the history detection set comprises a plurality of historical application transaction data;
Processing historical application transaction data in the historical detection set by using the clustering algorithm to obtain target classification of each historical application transaction data and a target clustering center of each target classification;
determining target distances between each historical application transaction data and a target cluster center of a target class to which the historical application transaction data belongs, and obtaining at least one target distance in each target class;
calculating a distance threshold value of each target class according to at least one target distance in each target class and the probability density estimation algorithm; and
generating an anomaly detection model according to the distance threshold value of each target clustering center and each target classification;
wherein the calculating a distance threshold value of each target class according to at least one target distance in each target class and the probability density estimation algorithm comprises the following steps:
determining target distances between the historical application transaction data and target clustering centers corresponding to the historical application transaction data;
determining a distance mean and a distance variance for each of the target distances in each of the target classifications; and
And processing the distance mean value and the distance variance of each target class by using a Gaussian distribution algorithm to obtain the distance threshold value of each target class.
2. The method according to claim 1, wherein said processing the historical application transaction data in the historical detection set by using the clustering algorithm to obtain the target classification to which the historical application transaction data belongs and the target clustering center of each target classification, includes:
determining an initial cluster center of each target class;
determining an initial distance between each of the historical application transaction data and each of the initial cluster centers;
determining the target classification to which each historical application transaction data belongs according to the initial distance;
determining a distance average value of the initial distances in each target classification, and taking the distance average value as a new initial clustering center of the target classification;
repeating the operation of determining the initial distance and determining a new initial cluster center of the target classification until a preset condition is met; and
and taking the new initial cluster center of each target class obtained when the preset condition is met as a target cluster center of the corresponding target class.
3. The method according to claim 2, wherein the processing the application transaction data to be detected using an anomaly detection model to obtain a detection result includes:
determining the space distance between the application transaction data to be detected and each target clustering center;
determining target classification to which the application transaction data to be detected belongs according to each spatial distance; and
and obtaining a detection result according to the space distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
4. A method according to claim 3, wherein the obtaining a detection result according to a spatial distance between the application transaction data to be detected and the target cluster center of the target class and a distance threshold of the target class comprises:
if the space distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold value of the target classification, the detection result is that the application transaction data to be detected is abnormal data; and
and if the space distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold value of the target classification, the detection result is that the application transaction data to be detected is normal data.
5. The method of claim 4, wherein the application transaction data to be detected comprises a plurality of dimensions;
if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detection result is that after the application transaction data to be detected is abnormal data, the method further includes:
determining the degree of abnormality of each dimension in the application transaction data to be detected;
and determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which leads the application transaction data to be detected to be abnormal data.
6. An abnormality detection apparatus comprising:
the acquisition module is used for acquiring application transaction data to be detected; and
the processing module is used for processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomaly data or not, and the anomaly detection model is generated based on combined training of a clustering algorithm and a probability density estimation algorithm;
wherein, the processing module includes:
A first acquisition sub-module for acquiring a history detection set, wherein the history detection set comprises a plurality of history application transaction data;
the first acquisition sub-module is used for processing the historical application transaction data in the historical detection set by utilizing the clustering algorithm to obtain target classification of each historical application transaction data and a target clustering center of each target classification;
the first determining submodule is used for determining target distances between each historical application transaction data and a target clustering center of a target class to which the historical application transaction data belongs, and obtaining at least one target distance in each target class;
the calculating sub-module is used for calculating a distance threshold value of each target class according to at least one target distance in each target class and the probability density estimation algorithm; and
the generation sub-module is used for generating an abnormality detection model according to the target clustering centers and the distance threshold value of each target class;
wherein the computing sub-module comprises:
a sixth determining unit, configured to determine a target distance between each of the historical application transaction data and a target cluster center corresponding to the historical application transaction data;
A seventh determining unit configured to determine a distance mean and a distance variance of the respective target distances in each of the target classifications; and
and the obtaining unit is used for processing the distance mean value and the distance variance of each target class by using a Gaussian distribution algorithm to obtain the distance threshold value of each target class.
7. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
8. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to implement the method of any of claims 1 to 5.
CN202010809725.0A 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium Active CN111814910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010809725.0A CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010809725.0A CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN111814910A CN111814910A (en) 2020-10-23
CN111814910B true CN111814910B (en) 2023-09-19

Family

ID=72860432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010809725.0A Active CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN111814910B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN113486302A (en) * 2021-07-12 2021-10-08 浙江网商银行股份有限公司 Data processing method and device
CN113854990A (en) * 2021-10-27 2021-12-31 青岛海信日立空调系统有限公司 Heartbeat detection method and device
CN114298147A (en) * 2021-11-23 2022-04-08 深圳无域科技技术有限公司 Abnormal sample detection method and device, electronic equipment and storage medium
CN116185315B (en) * 2023-04-27 2023-07-14 美恒通智能电子(广州)股份有限公司 Hand-held printer data monitoring and early warning system and method based on artificial intelligence

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN109558416A (en) * 2018-11-07 2019-04-02 北京先进数通信息技术股份公司 A kind of detection method traded extremely, device and storage medium
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200118136A1 (en) * 2018-10-16 2020-04-16 Mastercard International Incorporated Systems and methods for monitoring machine learning systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN109558416A (en) * 2018-11-07 2019-04-02 北京先进数通信息技术股份公司 A kind of detection method traded extremely, device and storage medium
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification

Also Published As

Publication number Publication date
CN111814910A (en) 2020-10-23

Similar Documents

Publication Publication Date Title
CN111814910B (en) Abnormality detection method, abnormality detection device, electronic device, and storage medium
CN108229419B (en) Method and apparatus for clustering images
WO2021174944A1 (en) Message push method based on target activity, and related device
CN111783039B (en) Risk determination method, risk determination device, computer system and storage medium
CN110135978B (en) User financial risk assessment method and device, electronic equipment and readable medium
CN113051480A (en) Resource pushing method and device, electronic equipment and storage medium
CN110245684B (en) Data processing method, electronic device, and medium
CN115034315A (en) Business processing method and device based on artificial intelligence, computer equipment and medium
CN112950347B (en) Resource data processing optimization method and device, storage medium and terminal
CN114202417A (en) Abnormal transaction detection method, apparatus, device, medium, and program product
CN111858267B (en) Early warning method, early warning device, electronic equipment and storage medium
JP7170689B2 (en) Output device, output method and output program
CN113379469A (en) Abnormal flow detection method, device, equipment and storage medium
CN113052509A (en) Model evaluation method, model evaluation apparatus, electronic device, and storage medium
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN110675196A (en) User identification method and device, electronic equipment and storage medium
US20230385456A1 (en) Automatic segmentation using hierarchical timeseries analysis
CN113254787B (en) Event analysis method, device, computer equipment and storage medium
CN117541885A (en) Sample data processing method, device, storage medium and system
CN115981970A (en) Operation and maintenance data analysis method, device, equipment and medium
CN117422545A (en) Credit risk identification method, apparatus, device and storage medium
CN116562984A (en) Commodity merging method and device, storage medium and computer equipment
CN116152597A (en) Training method and device for target detection model
CN114693421A (en) Risk assessment method, apparatus, electronic device and medium
CN117391490A (en) Evaluation information processing method and device for financial business and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant