CN111814910A - Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium - Google Patents

Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium Download PDF

Info

Publication number
CN111814910A
CN111814910A CN202010809725.0A CN202010809725A CN111814910A CN 111814910 A CN111814910 A CN 111814910A CN 202010809725 A CN202010809725 A CN 202010809725A CN 111814910 A CN111814910 A CN 111814910A
Authority
CN
China
Prior art keywords
transaction data
application transaction
target
distance
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010809725.0A
Other languages
Chinese (zh)
Other versions
CN111814910B (en
Inventor
李耕寅
吴声
常杰
熊慧君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010809725.0A priority Critical patent/CN111814910B/en
Publication of CN111814910A publication Critical patent/CN111814910A/en
Application granted granted Critical
Publication of CN111814910B publication Critical patent/CN111814910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the disclosure provides an anomaly detection method and device, electronic equipment and a storage medium, which can be applied to the fields of artificial intelligence, big data and information security. The method comprises the following steps: acquiring application transaction data to be detected; and processing application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is the anomaly data, and the anomaly detection model is generated based on the combined training of a clustering algorithm and a probability density estimation algorithm.

Description

Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an anomaly detection method and apparatus, an electronic device, and a storage medium.
Background
In application transactions, a large amount of application transaction data is generated. Abnormal data may occur in a large amount of application transaction data, where abnormal data refers to data that is inconsistent with other application transaction data. The abnormal data has the characteristics of small quantity and outlier, and is the basis for finding faults, so the detection of the abnormal data has important production and practical significance.
In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: the accuracy rate of abnormal data detection by adopting the correlation technique is low.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide an abnormality detection method and apparatus, an electronic device, and a storage medium.
One aspect of the embodiments of the present disclosure provides an abnormality detection method, including: acquiring application transaction data to be detected; and processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomalous data, and the anomaly detection model is generated based on the combined training of a clustering algorithm and a probability density estimation algorithm.
According to the embodiment of the present disclosure, the anomaly detection model is generated based on a clustering algorithm and a probability density estimation algorithm in a joint training mode, and includes: acquiring a historical detection set, wherein the historical detection set comprises a plurality of historical application transaction data; processing the historical application transaction data in the historical detection set by using the clustering algorithm to obtain a target classification to which each historical application transaction data belongs and a target clustering center of each target classification; determining a target distance between each historical application transaction data and a target clustering center of a target classification to which the historical application transaction data belongs to obtain at least one target distance in each target classification; calculating a distance threshold value of each target classification according to at least one target distance in each target classification and the probability density estimation algorithm; and generating an anomaly detection model according to the distance threshold value of each target clustering center and each target classification.
According to an embodiment of the present disclosure, the processing, by using the clustering algorithm, the historical application transaction data in the historical detection set to obtain the target classification to which each of the historical application transaction data belongs and the target clustering center of each of the target classifications includes: determining an initial clustering center of each target classification; determining an initial distance between each of said historical application transaction data and each of said initial cluster centers; determining a target classification to which each historical application transaction data belongs according to the initial distance; determining a distance mean value of each initial distance in each target classification, and taking the distance mean value as a new initial clustering center of the target classification; repeatedly executing the operation of determining the initial distance and determining a new initial clustering center of the target classification until a preset condition is met; and taking the new initial clustering center of each target classification obtained when the preset condition is met as the target clustering center of the corresponding target classification.
According to an embodiment of the present disclosure, the calculating a distance threshold of each of the target classifications according to at least one target distance in each of the target classifications and the probability density estimation algorithm includes: determining a target distance between each historical application transaction data and a target clustering center corresponding to the historical application transaction data; determining a distance mean and a distance variance of each of the target distances in each of the target classifications; and processing the distance mean and the distance variance of each target classification by using a Gaussian distribution algorithm to obtain the distance threshold of each target classification.
According to an embodiment of the present disclosure, the processing the application transaction data to be detected by using the anomaly detection model to obtain a detection result includes: determining a spatial distance between the application transaction data to be detected and each target clustering center; determining the target classification to which the application transaction data to be detected belongs according to each space distance; and obtaining a detection result according to the spatial distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
According to an embodiment of the present disclosure, the obtaining a detection result according to a spatial distance between the application transaction data to be detected and the target clustering center of the target classification and a distance threshold of the target classification includes: if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detection result indicates that the application transaction data to be detected is abnormal data; and if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold of the target classification, the detection result is that the application transaction data to be detected is normal data.
According to an embodiment of the present disclosure, the application transaction data to be detected includes a plurality of dimensions;
if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detecting result is that the application transaction data to be detected is abnormal data, and the method further includes:
determining the abnormal degree of each dimension in the application transaction data to be detected;
and determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which causes the application transaction data to be detected to be abnormal data.
Another aspect of the disclosed embodiments provides an abnormality detection apparatus, including: the acquisition module is used for acquiring application transaction data to be detected; and the processing module is used for processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomalous data, and the anomaly detection model is generated based on the joint training of a clustering algorithm and a probability density estimation algorithm.
Another aspect of the disclosed embodiments provides an electronic device, including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium having stored thereon executable instructions, which when executed by a processor, cause the processor to implement the method as described above.
Another aspect of embodiments of the present disclosure provides a computer program comprising computer executable instructions for implementing the method as described above when executed.
According to the embodiment of the disclosure, the application transaction data to be detected is obtained, and the application transaction data to be detected is processed by using the anomaly detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm, so that the detection result for representing whether the application transaction data is the anomaly data is obtained. Due to the fact that the abnormal data are detected through the abnormal detection model generated through combined training based on the clustering algorithm and the probability density estimation algorithm, accurate identification and positioning of the abnormal data are achieved, the problem that the accuracy of abnormal data detection in the related technology is low is at least partially solved, and the accuracy of abnormal data detection is improved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture for an anomaly detection method to which the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a method of anomaly detection according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of another anomaly detection method according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of an anomaly detection apparatus according to an embodiment of the present disclosure; and
fig. 5 schematically shows a block diagram of an electronic device adapted to implement an anomaly detection method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a convention analogous to "A, B or at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B or C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the course of implementing the disclosed concept, the inventors discovered that unsupervised learning can be employed to solve the problem of low accuracy of abnormal data detection. The unsupervised learning is machine learning with no label on training data. The embodiment of the disclosure provides a technical scheme for anomaly detection based on combination of a clustering algorithm and a probability density estimation algorithm.
Clustering algorithms may be used for cluster analysis. The cluster analysis is an unsupervised machine learning algorithm and belongs to an exploratory data analysis method. The cluster analysis is to divide similar objects into a target classification according to the distance or similarity between the objects to form a plurality of target classifications. A target classification refers to a collection of similar objects. The clustering result requires that the object similarity of the same target classification is high, and the object similarity of different target classifications is low. The clustering algorithm may include a K-means clustering algorithm, a K-center clustering algorithm, a CLARA (clustering LARge application) algorithm, or a fuzzy C-means algorithm. A probability density estimation algorithm may be used to determine the anomaly boundary threshold. Wherein the anomaly boundary threshold may be used as a criterion for determining whether the data is anomalous data.
The technical scheme provided by the embodiment of the disclosure can also be adopted for abnormal data detection in the whole link of the application transaction. Applying a transaction full link may refer to various types of transactions. The transaction may include an internet transaction, a banking transaction, and the like. Accordingly, the object described above may refer to application transaction data. Application transaction data may refer to data generated during a transaction. In applying the transaction global link, the target classification may refer to an operation mode, and the operation mode may include a transaction generated on weekdays, a transaction generated on double holidays, a transaction generated in the morning, a transaction generated in the afternoon, and the like. It should be noted that, the embodiments of the present disclosure are mainly directed to abnormal data detection applied to transaction full links, and will be described below with reference to specific embodiments.
The embodiment of the disclosure provides an abnormality detection method and device and electronic equipment capable of applying the method. The anomaly detection method and device and the electronic equipment can be used in the fields of artificial intelligence, big data and information safety. The method includes a detection process and a training process. In the detection process, application transaction data to be detected are obtained, the application transaction data to be detected are processed by using an anomaly detection model generated based on the combined training of a clustering algorithm and a probability density estimation algorithm, and a detection result is obtained, wherein the detection result is used for representing whether the application transaction data are anomalous data or not. In the training process, a historical detection set is obtained, wherein the historical detection set comprises a plurality of historical application transaction data. And processing historical application transaction data in the historical detection set by using a clustering algorithm to obtain a target classification to which each historical detection point belongs and a target clustering center of each target classification. Determining a target distance between each historical application transaction data and a target clustering center of a target classification to which a historical detection point belongs to obtain at least one target distance in each target classification, calculating a distance threshold of each target classification according to the at least one target distance in each target classification and a probability density estimation algorithm, and generating an anomaly detection model according to each target clustering center and the distance threshold of each target classification.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which an anomaly detection method may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various messaging client applications installed thereon, such as a banking application, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that an anomaly detection method provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the abnormality detection apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The anomaly detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the abnormality detection apparatus provided in the embodiment of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 schematically shows a flow chart of a method of anomaly detection according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S220.
In operation S210, application transaction data to be detected is acquired.
In embodiments of the present disclosure, the application transaction data to be detected may refer to data generated during a transaction. The application transaction data to be detected may include multiple dimensions, i.e., the application transaction data to be detected may be multidimensional data.
In the anomaly detection of the full link of the application transaction, the dimensionality of the application transaction data to be detected can comprise at least one of the internet connection transaction rate, the Unionpay transaction rate, the internet connection transaction response time, the Unionpay transaction response time, the internet connection transaction success rate and the Unionpay transaction success rate. In addition, in order to achieve better anomaly detection effect, the dimension of the application transaction data to be detected can also be a time dimension, and the time dimension can comprise months, years, week numbers and time periods, wherein the time periods can comprise morning time periods and evening time periods. Exemplary, for example, morning hours of 06:00-18:00 and evening hours of 18:00-06: 00.
The application transaction data to be detected can be application transaction data to be detected sent by the electronic equipment from other electronic equipment, or application transaction data to be detected locally stored in the electronic equipment. The electronic device described in the embodiments of the present disclosure may refer to a server, and the other electronic devices may refer to terminals. Illustratively, the user purchases a commodity on a platform by using the terminal and selects a bank card to pay. And after the payment of the user is completed, the terminal generates application transaction data and sends the application transaction data to the server.
In operation S220, the application transaction data to be detected is processed by using an anomaly detection model to obtain a detection result, where the detection result is used to characterize whether the application transaction data is anomalous data, and the anomaly detection model is generated based on a clustering algorithm and a probability density estimation algorithm in a joint training manner.
In the embodiment of the disclosure, in order to determine whether the application transaction data to be detected is abnormal data, a mode of processing the application transaction data to be detected may be adopted, where the abnormal detection model is generated based on a clustering algorithm and a probability density estimation algorithm in a joint training mode. The clustering algorithm may include a K-means clustering algorithm, a K-center clustering algorithm, a CLARA algorithm, or a fuzzy C-means algorithm. The probability density estimation algorithm may comprise a gaussian distribution algorithm.
And inputting the application transaction data to be detected into the anomaly detection model, and outputting a detection result for representing whether the application transaction data is the anomaly data. The specific form of the detection result may be set according to actual conditions, and is not limited specifically herein.
It should be noted that the anomaly detection model may be a model generated by training historical application transaction data by using a clustering algorithm and a probability density estimation algorithm. The clustering result of the clustering algorithm is to determine the target clustering center of each target classification, and the probability density estimation algorithm can determine the distance threshold corresponding to the target classification according to the clustering result, so that the anomaly detection model can determine the target classification to which the application transaction data belongs, and determine whether the application transaction data is anomalous data according to the distance between the application transaction data and the target clustering center of the target classification to which the application transaction data belongs and the distance threshold.
It should be further noted that the anomaly detection model generated based on the combined training of the clustering algorithm and the probability density estimation algorithm is used for processing the application transaction data to be detected, so that the obtained detection result for representing whether the application transaction data to be detected is anomalous data is more accurate, and the application transaction data which is easier to be determined as anomalous data in the related art can be effectively identified, wherein the application transaction data which is easier to be determined as anomalous data comprises dense outlier data which is less existed in a traffic peak period or a traffic valley period.
According to the technical scheme of the embodiment of the disclosure, the application transaction data to be detected is obtained, and the application transaction data to be detected is processed by using the anomaly detection model generated based on the clustering algorithm and the probability density estimation algorithm joint training, so that the detection result for representing whether the application transaction data is the anomaly data is obtained. Due to the fact that the abnormal data are detected through the abnormal detection model generated through combined training based on the clustering algorithm and the probability density estimation algorithm, accurate identification and positioning of the abnormal data are achieved, the problem that the accuracy of abnormal data detection in the related technology is low is at least partially solved, and the accuracy of abnormal data detection is improved.
Optionally, on the basis of the foregoing technical solution, the anomaly detection model may be generated based on a joint training of a clustering algorithm and a probability density estimation algorithm, and may include: a historical detection set is obtained, wherein the historical detection set comprises a plurality of historical application transaction data. And processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification to which each piece of historical application transaction data belongs and the target clustering center of each target classification. And determining the target distance between each historical application transaction data and the target clustering center of the target classification to which the historical application transaction data belongs to obtain at least one target distance in each target classification. And calculating to obtain a distance threshold value of each target classification according to at least one target distance in each target classification and a probability density estimation algorithm. And generating an anomaly detection model according to the distance threshold of each target clustering center and each target classification.
In the embodiment of the present disclosure, in order to obtain the anomaly detection model, a clustering algorithm and a probability density estimation algorithm may be used to train the historical check set. Wherein the historical check set may include a plurality of historical application transaction data. Historical application transaction data may refer to data generated during a historical transaction. The historical application transaction data may include historical application transaction data and/or historical application transaction data. The historical application transaction data may include multiple dimensions, i.e., the historical application transaction data may be multidimensional data.
In the anomaly detection of the full link of the application transaction, the dimension of the historical application transaction data can comprise at least one of the internet connection transaction rate, the Unionpay transaction rate, the internet connection transaction response time, the Unionpay transaction response time, the internet connection transaction success rate and the Unionpay transaction success rate. Furthermore, to achieve better anomaly detection effects, the dimension of the historical application transaction data may also be a time dimension, which may include months, years, weeks, and time periods.
It should be noted that the historical application transaction data may be historical application transaction data that is received by the electronic device from other electronic devices, or may be historical application transaction data that is locally stored by the electronic device. The electronic device described in the embodiments of the present disclosure may refer to a server, and the other electronic devices may refer to terminals.
After obtaining the historical application transaction data, a clustering algorithm may be used to process each historical application transaction data in the historical detection set to obtain a target classification to which each historical application transaction data belongs and a target clustering center for each target classification. The clustering algorithm may be a K-means clustering algorithm. The target category may refer to an operational mode, which may include transactions generated on weekdays, transactions generated on double holidays, transactions generated in the morning and transactions generated in the afternoon, and so on.
After obtaining the target classification to which each historical application transaction data belongs and the target clustering center corresponding to each target classification, for each historical application transaction data, a target distance between the historical application transaction data and the target clustering center of the target classification to which the historical application transaction data belongs may be determined. Based on this, the target distance of each historical application transaction data can be obtained. Accordingly, for each target classification, the respective target distance corresponding to the target classification may be obtained.
After obtaining the respective target distance of each target class, for each target class, a distance threshold of the target class may be determined according to a probability density estimation algorithm and the respective target distance of the target class. Based on this, a distance threshold for each target classification can be obtained.
After the distance threshold value of each target classification and each target clustering center are obtained, an anomaly detection model can be generated according to each target clustering center and the distance threshold value of each target classification. According to the embodiment of the disclosure, the anomaly detection model is used for determining whether the application transaction data to be detected is anomalous data or not through clustering and threshold detection. The distance threshold value can be used as a standard for determining whether the application transaction data to be detected is abnormal data.
According to the embodiment of the present disclosure, since the distance threshold, which is a criterion for determining whether the application transaction data to be detected is abnormal data, is determined based on the target distance corresponding to the historical application transaction data under the target classification, not based on expert experience, the distance threshold is more suitable for the actual situation. Because the distance threshold value is more in line with the actual situation, the accuracy rate of determining whether the application transaction data to be detected is abnormal data based on the distance threshold value is improved, namely the accuracy rate of detecting the abnormal data is improved.
It should be noted that the training interval time of the anomaly detection model and the number of the historical application transaction data included in the historical detection set can be flexibly adjusted according to the requirements of the user. The anomaly detection model can cluster frequently-occurring historical situations into historical normal patterns.
Optionally, on the basis of the above technical solution, processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification to which each historical application transaction data belongs and the target clustering center of each target classification, which may include: an initial cluster center for each target classification is determined. An initial distance between each historical application transaction data and each initial cluster center is determined. And determining the target classification to which each historical application transaction data belongs according to the initial distance. And determining the distance mean value of each initial distance in each target classification, and taking the distance mean value as a new initial clustering center of the target classification. And repeatedly executing the operations of determining the initial distance and determining a new initial clustering center of the target classification until a preset condition is met. And taking the new initial clustering center of each target classification obtained when the preset condition is met as the target clustering center of the corresponding target classification.
In the embodiment of the present disclosure, in order to determine the target classification to which the application transaction data to be detected belongs and the target clustering center of each target classification, a clustering algorithm may be employed.
In the data space T, the historical detection set X may include M historical application transaction data, where the historical detection set X ═ (X) is1,x2,...,xi,...,xM-1,xM). Historical application transaction data xi=(xi1,xi2,...,xij,...,xiN-1,xiN) Where i 1, 2., M-1, M, j 1, 2., N-1, N, j denote the dimensionality of the historical application transaction data. The history detection set X is equivalent toAn M x N matrix.
The clustering algorithm aims to divide the history detection set X into K target classifications, and determines a corresponding target clustering center for each target classification. The basis for the division may be a similarity between historical application transaction data. The index representing the similarity may include a similarity coefficient or a distance index, and the distance index may include a euclidean distance, a square of the euclidean distance, a manhattan distance, a chebyshev distance, a chi-square distance, or the like. The smaller the distance between different historical application transaction data, the higher the similarity between different historical application transaction data can be illustrated. The greater the correlation coefficient between different historical application transaction data, the higher the similarity between different historical application transaction data may also be. The clustering algorithm may include a K-means clustering algorithm. The K-means clustering algorithm is an iterative clustering algorithm that uses a distance index as an index of similarity. According to the technical scheme of the embodiment of the disclosure, a history detection set is processed by adopting a K-means clustering algorithm, and a target clustering center of each target classification is obtained. The following specifically describes a process of obtaining a target clustering center of each target classification by processing a history detection set by using a K-means clustering algorithm.
K object classifications are preset. And aiming at each target classification, randomly selecting one piece of historical application transaction data from the historical detection set as an initial clustering center of the target classification. Based on this, K initial cluster centers can be obtained. It should be noted that the specific value of K may be determined according to the number of target classes that the historical detection set may correspond to. The specific value of K may also be determined in an adjustable manner.
After the initial clustering centers of the K target classes are determined, for each historical application transaction data, an initial distance between the historical application transaction data and each initial clustering center is determined, that is, K initial distances can be obtained. I.e., each historical application transaction data corresponds to K initial distances. And determining the target classification of each historical application transaction data based on the principle that the distance between the historical application transaction data and the initial clustering center is minimum.
After the target classification to which each historical application transaction data belongs is obtained, for each target classification, a distance mean value of each initial distance is determined according to the initial distance of each historical application transaction data belonging to the target classification, and the distance mean value is used as a new initial clustering center of the target classification. Based on this, K new initial cluster centers can be obtained.
And repeatedly executing the operation of determining the initial distance between each piece of historical application transaction data and each initial clustering center, determining the target classification to which each piece of historical application transaction data belongs according to the initial distance, determining the distance mean value of each initial distance in each target classification, taking the distance mean value as a new initial clustering center of the target classification until the preset condition is met, and finishing the repeated execution operation. The preset condition may be that each new initial clustering center changes within a preset range before and after each iteration, or that a preset number of iterations is reached.
And when the preset conditions are met, acquiring current K new initial clustering centers, and taking each current new initial clustering center as a target clustering center of the corresponding target classification.
Optionally, on the basis of the foregoing technical solution, calculating a distance threshold of each target classification according to at least one target distance in each target classification and a probability density estimation algorithm may include: and determining the target distance between each historical application transaction data and the target clustering center corresponding to the historical application transaction data. A range mean and a range variance are determined for each target range in each target classification. And processing the distance mean value and the distance variance of each target classification by using a Gaussian distribution algorithm to obtain the distance threshold value of each target classification.
In an embodiment of the present disclosure, in order to obtain a distance threshold for each target classification, a gaussian distribution algorithm may be employed.
And determining the target distance between the historical application transaction data and the target cluster center corresponding to the historical application transaction data aiming at each historical application transaction data. Based on this, the target distance of each historical application transaction data can be obtained.
And aiming at each target classification, determining the distance mean and the distance variance with the target classification according to the target distance under the target classification. And processing the distance mean value and the distance variance of the target classification by adopting a Gaussian distribution algorithm to obtain the distance threshold value of the target classification. Namely, the distance mean and the distance variance of the target classification can be used as parameters of a Gaussian distribution algorithm, and the distance threshold of the target classification is obtained through calculation.
Optionally, on the basis of the above technical solution, processing the application transaction data to be detected by using the anomaly detection model to obtain a detection result, which may include: and determining the spatial distance between the application transaction data to be detected and each target cluster center. And determining the target classification to which the application transaction data to be detected belongs according to each spatial distance. And obtaining a detection result according to the spatial distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
In an embodiment of the present disclosure, the anomaly detection model is generated from a target cluster center for each target class and a distance threshold for each target class. After the application transaction data to be detected is acquired, the spatial distance between the application transaction data to be detected and the target clustering center of each target classification in the anomaly detection model can be determined. Based on this, the spatial distance corresponding to the target cluster center of each target classification can be obtained. And determining the minimum spatial distance from the spatial distances, and attributing the application transaction data to be detected to the target classification to which the target clustering center with the minimum spatial distance to the application transaction data to be detected belongs.
After the target classification to which the application transaction data to be detected belongs is obtained, the spatial distance between the application transaction data to be detected and the target clustering center of the target classification can be compared with the distance threshold of the target classification, and according to the comparison result, a detection result used for representing whether the application transaction data is abnormal data is determined.
Optionally, on the basis of the above technical solution, obtaining a detection result according to a spatial distance between the application transaction data to be detected and a target clustering center of the target classification and a distance threshold of the target classification, may include: and if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, determining that the application transaction data to be detected is abnormal data. And if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold of the target classification, the detection result is that the application transaction data to be detected is normal data.
In an embodiment of the disclosure, after obtaining a spatial distance between the application transaction data to be detected and a target clustering center of the target classification and a distance threshold of the target classification, the spatial distance is compared with the distance threshold.
If the spatial distance is greater than or equal to the distance threshold, the application transaction data to be detected can be indicated as abnormal data. If the spatial distance is smaller than the distance threshold, the application transaction data to be detected can be indicated as normal data.
Optionally, on the basis of the above technical solution, the application transaction data to be detected includes multiple dimensions. If the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detecting result is that the application transaction data to be detected is abnormal data, and the method further includes: and determining the abnormal degree of each dimension in the application transaction data to be detected. And determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which causes the application transaction data to be detected to be abnormal data.
In the embodiment of the present disclosure, since the application transaction data to be detected may include a plurality of dimensions, a main reason that the application transaction data to be detected is abnormal data may be one of the dimensions. In order to determine the dimension causing the application transaction data to be detected to be abnormal data, a manner of determining the degree of abnormality of each dimension may be adopted.
And determining the abnormal degree of each dimension in the application transaction data to be detected aiming at the application transaction data to be detected which is determined to be abnormal data, wherein the abnormal degree can be used for representing the deviation degree of the dimension. And determining the dimension with the highest abnormal degree as a target dimension which causes the application transaction data to be detected to be abnormal data.
Optionally, determining the degree of anomaly of each dimension in the application transaction data to be detected may include: mapping the application transaction data to be detected to a preset coordinate system to obtain the projection length corresponding to each dimension. The projection length corresponding to each dimension is taken as the degree of abnormality of each dimension. Correspondingly, determining the dimension with the highest degree of abnormality as the target dimension causing the application transaction data to be detected to be abnormal data may include: and determining the dimension with the maximum projection length as a target dimension, wherein the target dimension is the dimension which causes the application transaction data to be detected to be abnormal data.
The abnormal dimensionality used for representing the application transaction data to be detected as the abnormal data is determined according to the abnormal degree, so that the reason for causing the data abnormality is accurately positioned.
Optionally, on the basis of the above technical solution, acquiring the application transaction data to be detected may include: raw application transaction data is obtained. And preprocessing the original application transaction data to obtain application transaction data to be detected.
In an embodiment of the present disclosure, the pre-processing may include at least one of data cleansing, data integration, data reduction, and data transformation. Wherein the data transformation may comprise a normalization process. The normalization process may achieve a consistent weight for each dimension in the raw application transaction data. Namely, a standardization algorithm is adopted to convert data of each dimension in the original application transaction data into standard data with the mean value of 0 and the variance of 1 so as to obtain the application transaction data to be detected.
It should be noted that each historical application transaction data in the history detection set may be the historical application transaction data after being preprocessed. The data may be pre-processed in the same manner as the raw application transaction data to obtain a historical detection set.
FIG. 3 schematically illustrates a flow chart of another anomaly detection method according to an embodiment of the present disclosure.
As shown in fig. 3, the method includes operations S301 to S320.
In operation S301, a history detection set is acquired, wherein the history detection set includes a plurality of history application transaction data.
In operation S302, an initial cluster center for each target classification is determined.
In operation S303, an initial distance between each historical application transaction data and each initial cluster center is determined.
In operation S304, a target class to which each historical application transaction data belongs is determined according to the initial distance.
In operation S305, a distance mean of the respective initial distances in each target classification is determined, and the distance mean is used as a new initial cluster center of the target classification.
In operation S306, determining whether a preset condition is satisfied; if yes, returning to execute operation S303; if not, operation S307 is performed.
In operation S307, the new initial cluster center of each target classification obtained when the preset condition is satisfied is used as the target cluster center of the corresponding target classification.
In operation S308, a target distance between each historical application transaction data and a target cluster center of a target class to which the historical application transaction data belongs is determined, and at least one target distance in each target class is obtained.
In operation S309, a target distance of each historical application transaction data from a target cluster center corresponding to the historical application transaction data is determined.
In operation S310, a distance mean and a distance variance of respective target distances in each target classification are determined.
In operation S311, the distance mean and the distance variance of each target class are processed by using a gaussian distribution algorithm to obtain a distance threshold of each target class.
In operation S312, an anomaly detection model is generated according to the distance thresholds of the respective target cluster centers and the respective target classifications.
In operation S313, application transaction data to be detected is acquired, wherein the application transaction data to be detected includes a plurality of dimensions.
In operation S314, a spatial distance between the application transaction data to be detected and each target cluster center is determined.
In operation S315, a target classification to which the application transaction data to be detected belongs is determined according to the respective spatial distances.
In operation S316, whether a spatial distance between the application transaction data to be detected and a target cluster center of the target classification is greater than or equal to a distance threshold of the target classification; if yes, perform operation S317; if not, operation S318 is performed.
In operation S317, the application transaction data to be detected is abnormal data as a result of the detection, and operation S319 is performed.
In operation S318, the application transaction data to be detected is normal data as a result of the detection.
In operation S319, the degree of abnormality of each dimension in the application transaction data to be detected is determined.
In operation S320, an anomaly dimension is determined according to each anomaly degree, where the anomaly dimension is used to characterize a dimension that causes the application transaction data to be detected to be the anomaly data.
In the embodiment of the disclosure, by adopting the technical scheme provided by the embodiment of the disclosure, the hit rate of abnormal data detection can reach more than 90%, the accuracy rate can reach more than 95%, and a higher extremely-assisted positioning and decision-making effect is achieved for production emergency.
According to the technical scheme of the embodiment of the disclosure, the anomaly detection model is a model generated by training historical application transaction data by adopting a clustering algorithm and a probability density estimation algorithm, wherein the clustering result of the clustering algorithm is to determine the target clustering center of each target classification, and the probability density estimation algorithm is to determine the distance threshold corresponding to the target classification according to the clustering result, so that the anomaly detection model can determine the target classification to which the application transaction data belongs, and determine whether the application transaction data is abnormal data according to the distance between the application transaction data and the target clustering center of the target classification to which the application transaction data belongs and the distance threshold. Since the distance threshold, which is a criterion for determining whether the application transaction data to be detected is abnormal data, is determined based on the target distance corresponding to the historical application transaction data under the target classification, rather than based on expert experience, the distance threshold is more in line with the actual situation. Because the distance threshold value is more in line with the actual situation, the accuracy rate of determining whether the application transaction data to be detected is abnormal data based on the distance threshold value is improved, namely the accuracy rate of detecting the abnormal data is improved. In addition, the abnormal dimension for representing that the application transaction data to be detected is abnormal data is determined according to the abnormal degree by determining the abnormal degree of each dimension in the application transaction data to be detected, so that the reason for causing the data abnormality is more accurately positioned.
Fig. 4 schematically shows a block diagram of an abnormality detection apparatus according to an embodiment of the present disclosure.
As shown in fig. 4, the abnormality detection apparatus 400 may include an acquisition module 410 and a processing module 420.
The acquisition module 410 is communicatively coupled to the processing module 420.
The obtaining module 410 is configured to obtain the application transaction data to be detected.
The processing module 420 is configured to process the application transaction data to be detected by using an anomaly detection model to obtain a detection result, where the detection result is used to represent whether the application transaction data is anomalous data, and the anomaly detection model is generated based on a clustering algorithm and a probability density estimation algorithm in a joint training manner.
According to the technical scheme of the embodiment of the disclosure, the application transaction data to be detected is obtained, and the application transaction data to be detected is processed by using the anomaly detection model generated based on the clustering algorithm and the probability density estimation algorithm joint training, so that the detection result for representing whether the application transaction data is the anomaly data is obtained. Due to the fact that the abnormal data are detected through the abnormal detection model generated through combined training based on the clustering algorithm and the probability density estimation algorithm, accurate identification and positioning of the abnormal data are achieved, the problem that the accuracy of abnormal data detection in the related technology is low is at least partially solved, and the accuracy of abnormal data detection is improved.
Optionally, on the basis of the foregoing technical solution, the processing module 420 may include a first obtaining sub-module, a first determining sub-module, a calculating sub-module, and a generating sub-module.
The first obtaining submodule is used for obtaining a historical detection set, wherein the historical detection set comprises a plurality of historical application transaction data.
And the first obtaining submodule is used for processing the historical application transaction data in the historical detection set by using a clustering algorithm to obtain the target classification to which each historical application transaction data belongs and the target clustering center of each target classification.
The first determining submodule is used for determining the target distance between each historical application transaction data and the target clustering center of the target classification to which the historical application transaction data belongs, and obtaining at least one target distance in each target classification.
And the calculation submodule is used for calculating the distance threshold of each target classification according to at least one target distance in each target classification and a probability density estimation algorithm.
And the generation submodule is used for generating an abnormal detection model according to the distance threshold of each target clustering center and each target classification.
Optionally, on the basis of the above technical solution, the first obtaining sub-module may include a first determining unit, a second determining unit, a third determining unit, a fourth determining unit, a repeated executing unit, and a fifth determining unit.
A first determining unit for determining an initial cluster center for each target classification.
A second determining unit for determining an initial distance between each historical application transaction data and each initial cluster center.
And the third determining unit is used for determining the target classification to which each historical application transaction data belongs according to the initial distance.
And the fourth determining unit is used for determining the distance mean value of each initial distance in each target classification, and taking the distance mean value as a new initial clustering center of the target classification.
And the repeated execution unit is used for repeatedly executing the operations of determining the initial distance and determining a new initial clustering center of the target classification until a preset condition is met.
And the fifth determining unit is used for taking the new initial clustering center of each target classification obtained when the preset condition is met as the target clustering center of the corresponding target classification.
Optionally, on the basis of the above technical solution, the calculation submodule may include a sixth determining unit, a seventh determining unit, and an obtaining unit.
And the sixth determining unit is used for determining the target distance between each historical application transaction data and the target clustering center corresponding to the historical application transaction data.
A seventh determining unit for determining a distance mean and a distance variance of the respective target distances in each target classification.
And the obtaining unit is used for processing the distance mean value and the distance variance of each target classification by utilizing a Gaussian distribution algorithm to obtain the distance threshold of each target classification.
Optionally, on the basis of the above technical solution, the processing module 420 may include a second determining sub-module, a third determining sub-module, and a second obtaining sub-module.
And the second determining submodule is used for determining the spatial distance between the application transaction data to be detected and each target clustering center.
And the third determining submodule is used for determining the target classification to which the application transaction data to be detected belongs according to each spatial distance.
And the second obtaining submodule is used for obtaining a detection result according to the spatial distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
Optionally, on the basis of the above technical solution, the second obtaining submodule may include an eighth determining unit and a ninth determining unit.
And the eighth determining unit is used for determining that the application transaction data to be detected is abnormal data if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification.
And the ninth determining unit is used for determining that the application transaction data to be detected is normal data if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold of the target classification.
Optionally, on the basis of the above technical solution, the application transaction data to be detected includes multiple dimensions.
The abnormality detection apparatus 400 may further include a first determination module and a second determination module.
The first determining module is used for determining the abnormal degree of each dimension in the application transaction data to be detected.
And the second determining module is used for determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which causes the application transaction data to be detected to be abnormal data.
Optionally, on the basis of the above technical solution, the obtaining module 410 may include a second obtaining sub-module and a processing sub-module.
And the second acquisition submodule is used for acquiring the original application transaction data.
And the processing submodule is used for preprocessing the original application transaction data to obtain the application transaction data to be detected.
Any number of modules, sub-modules, units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware Circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a Circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, one or more of the modules, sub-modules, units according to embodiments of the disclosure may be implemented at least partly as computer program modules, which, when executed, may perform corresponding functions.
For example, any number of the obtaining module 410 and the processing module 420 may be combined and implemented in one module/unit, or any one of the modules/units may be split into a plurality of modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units and implemented in one module/unit. According to an embodiment of the present disclosure, at least one of the obtaining module 410 and the processing module 420 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the obtaining module 410 and the processing module 420 may be at least partly implemented as a computer program module, which when executed may perform a corresponding function.
It should be noted that the abnormality detection apparatus portion in the embodiment of the present disclosure corresponds to the abnormality detection method portion executed by the electronic device in the embodiment of the present disclosure, and the description of the abnormality detection apparatus portion specifically refers to the abnormality detection method portion, and is not described herein again.
Fig. 5 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present disclosure includes a processor 501, which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.
In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are stored. The processor 501, the ROM502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, system 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The system 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable Computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable Programmable Read-Only Memory (EPROM) (erasable Programmable Read-Only Memory) or flash Memory), a portable compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the preceding. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM502 and/or RAM 503 and/or one or more memories other than ROM502 and RAM 503 described above.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. An anomaly detection method comprising:
acquiring application transaction data to be detected; and
processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is the anomaly data, and the anomaly detection model is generated based on the combined training of a clustering algorithm and a probability density estimation algorithm.
2. The method of claim 1, wherein the anomaly detection model is generated based on a clustering algorithm and a probability density estimation algorithm joint training, comprising:
obtaining a historical detection set, wherein the historical detection set comprises a plurality of historical application transaction data;
processing the historical application transaction data in the historical detection set by using the clustering algorithm to obtain a target classification to which each historical application transaction data belongs and a target clustering center of each target classification;
determining a target distance between each historical application transaction data and a target clustering center of a target classification to which the historical application transaction data belongs to obtain at least one target distance in each target classification;
calculating a distance threshold value of each target classification according to at least one target distance in each target classification and the probability density estimation algorithm; and
and generating an anomaly detection model according to the distance threshold value of each target clustering center and each target classification.
3. The method of claim 2, wherein said processing historical application transaction data in said historical detection set using said clustering algorithm to obtain target classifications to which respective historical application transaction data belongs and a target clustering center for each of said target classifications comprises:
determining an initial clustering center of each target classification;
determining an initial distance between each of the historical application transaction data and each of the initial cluster centers;
determining a target classification to which each historical application transaction data belongs according to the initial distance;
determining a distance mean value of each initial distance in each target classification, and taking the distance mean value as a new initial clustering center of the target classification;
repeatedly executing the operations of determining the initial distance and determining a new initial clustering center of the target classification until a preset condition is met; and
and taking the new initial clustering center of each target classification obtained when the preset condition is met as the target clustering center of the corresponding target classification.
4. The method of claim 2, wherein said calculating a distance threshold for each of said object classes based on at least one object distance in each of said object classes and said probability density estimation algorithm comprises:
determining a target distance between each historical application transaction data and a target clustering center corresponding to the historical application transaction data;
determining a distance mean and a distance variance of each of the target distances in each of the target classifications; and
and processing the distance mean value and the distance variance of each target classification by using a Gaussian distribution algorithm to obtain the distance threshold value of each target classification.
5. The method according to claim 2, wherein the processing the application transaction data to be detected by using the anomaly detection model to obtain a detection result comprises:
determining a spatial distance between the application transaction data to be detected and each target clustering center;
determining the target classification to which the application transaction data to be detected belongs according to the space distances; and
and obtaining a detection result according to the spatial distance between the application transaction data to be detected and the target clustering center of the target classification and the distance threshold value of the target classification.
6. The method according to claim 5, wherein the obtaining a detection result according to a spatial distance between the application transaction data to be detected and a target cluster center of the target classification and a distance threshold of the target classification comprises:
if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detection result indicates that the application transaction data to be detected is abnormal data; and
and if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is smaller than the distance threshold value of the target classification, the detection result indicates that the application transaction data to be detected is normal data.
7. The method of claim 6, wherein the application transaction data to be detected comprises a plurality of dimensions;
if the spatial distance between the application transaction data to be detected and the target clustering center of the target classification is greater than or equal to the distance threshold of the target classification, the detecting result is that the application transaction data to be detected is abnormal data, and the method further includes:
determining the abnormal degree of each dimension in the application transaction data to be detected;
and determining an abnormal dimension according to each abnormal degree, wherein the abnormal dimension is used for representing the dimension which causes the application transaction data to be detected to be abnormal data.
8. An abnormality detection device comprising:
the acquisition module is used for acquiring application transaction data to be detected; and
and the processing module is used for processing the application transaction data to be detected by using an anomaly detection model to obtain a detection result, wherein the detection result is used for representing whether the application transaction data is anomalous data, and the anomaly detection model is generated based on the combined training of a clustering algorithm and a probability density estimation algorithm.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7.
CN202010809725.0A 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium Active CN111814910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010809725.0A CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010809725.0A CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Publications (2)

Publication Number Publication Date
CN111814910A true CN111814910A (en) 2020-10-23
CN111814910B CN111814910B (en) 2023-09-19

Family

ID=72860432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010809725.0A Active CN111814910B (en) 2020-08-12 2020-08-12 Abnormality detection method, abnormality detection device, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN111814910B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486302A (en) * 2021-07-12 2021-10-08 浙江网商银行股份有限公司 Data processing method and device
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN113854990A (en) * 2021-10-27 2021-12-31 青岛海信日立空调系统有限公司 Heartbeat detection method and device
CN114298147A (en) * 2021-11-23 2022-04-08 深圳无域科技技术有限公司 Abnormal sample detection method and device, electronic equipment and storage medium
CN116185315A (en) * 2023-04-27 2023-05-30 美恒通智能电子(广州)股份有限公司 Hand-held printer data monitoring and early warning system and method based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
CN109558416A (en) * 2018-11-07 2019-04-02 北京先进数通信息技术股份公司 A kind of detection method traded extremely, device and storage medium
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification
US20200118136A1 (en) * 2018-10-16 2020-04-16 Mastercard International Incorporated Systems and methods for monitoring machine learning systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629593A (en) * 2018-04-28 2018-10-09 招商银行股份有限公司 Fraudulent trading recognition methods, system and storage medium based on deep learning
US20200118136A1 (en) * 2018-10-16 2020-04-16 Mastercard International Incorporated Systems and methods for monitoring machine learning systems
CN109558416A (en) * 2018-11-07 2019-04-02 北京先进数通信息技术股份公司 A kind of detection method traded extremely, device and storage medium
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110263827A (en) * 2019-05-31 2019-09-20 中国工商银行股份有限公司 Abnormal transaction detection method and device based on transaction rule identification

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591909A (en) * 2021-06-23 2021-11-02 北京智芯微电子科技有限公司 Abnormality detection method, abnormality detection device, and storage medium for power system
CN113486302A (en) * 2021-07-12 2021-10-08 浙江网商银行股份有限公司 Data processing method and device
CN113854990A (en) * 2021-10-27 2021-12-31 青岛海信日立空调系统有限公司 Heartbeat detection method and device
CN114298147A (en) * 2021-11-23 2022-04-08 深圳无域科技技术有限公司 Abnormal sample detection method and device, electronic equipment and storage medium
CN116185315A (en) * 2023-04-27 2023-05-30 美恒通智能电子(广州)股份有限公司 Hand-held printer data monitoring and early warning system and method based on artificial intelligence

Also Published As

Publication number Publication date
CN111814910B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN111814910B (en) Abnormality detection method, abnormality detection device, electronic device, and storage medium
US11244435B2 (en) Method and apparatus for generating vehicle damage information
CN108229419B (en) Method and apparatus for clustering images
JP2020522832A (en) System and method for issuing a loan to a consumer determined to be creditworthy
CN109564575A (en) Classified using machine learning model to image
US20210233080A1 (en) Utilizing a time-dependent graph convolutional neural network for fraudulent transaction identification
CN110929799B (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN111783039B (en) Risk determination method, risk determination device, computer system and storage medium
CN110135978B (en) User financial risk assessment method and device, electronic equipment and readable medium
US10909145B2 (en) Techniques for determining whether to associate new user information with an existing user
CN108491812B (en) Method and device for generating face recognition model
CN111611390B (en) Data processing method and device
CN115034315A (en) Business processing method and device based on artificial intelligence, computer equipment and medium
US20230126764A1 (en) Mixed quantum-classical method for fraud detection with quantum feature selection
CN114202417A (en) Abnormal transaction detection method, apparatus, device, medium, and program product
CN114187009A (en) Feature interpretation method, device, equipment and medium of transaction risk prediction model
CN115795345A (en) Information processing method, device, equipment and storage medium
CN115293222A (en) Systems, methods, and computer program products for determining uncertainty from deep learning classification models
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN113724017A (en) Pricing method and device based on neural network, electronic equipment and storage medium
CN112434083A (en) Event processing method and device based on big data
CN110675196A (en) User identification method and device, electronic equipment and storage medium
CN111429257A (en) Transaction monitoring method and device
US20230385456A1 (en) Automatic segmentation using hierarchical timeseries analysis
CN114693421A (en) Risk assessment method, apparatus, electronic device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant