CN114186626A - Abnormity detection method and device, electronic equipment and computer readable medium - Google Patents

Abnormity detection method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN114186626A
CN114186626A CN202111498620.9A CN202111498620A CN114186626A CN 114186626 A CN114186626 A CN 114186626A CN 202111498620 A CN202111498620 A CN 202111498620A CN 114186626 A CN114186626 A CN 114186626A
Authority
CN
China
Prior art keywords
abnormal
cluster
determining
clustering
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111498620.9A
Other languages
Chinese (zh)
Inventor
李腾
杨诚骜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111498620.9A priority Critical patent/CN114186626A/en
Publication of CN114186626A publication Critical patent/CN114186626A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an anomaly detection method, an anomaly detection device, electronic equipment and a computer readable medium, which relate to the technical field of artificial intelligence recognition and classification, and the method comprises the following steps: receiving a user application request, and determining a corresponding scene identifier; acquiring user scene data of a historical preset time period based on the scene identification; calling a clustering model to determine each clustering cluster based on user scene data; determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index; and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster. By identifying the similar mode of the users based on the scene data, the abnormal applications without actual association but with similar modes can be identified without actual association among the users, so that new fraudulent behaviors can be found in time, and property loss is effectively avoided.

Description

Abnormity detection method and device, electronic equipment and computer readable medium
Technical Field
The present application relates to the field of artificial intelligence recognition and classification technologies, and in particular, to an anomaly detection method and apparatus, an electronic device, and a computer-readable medium.
Background
Under the common promotion of consumption upgrading, policy support and financial science and technology development, the traditional financial institutions represented by commercial banks are accelerating to merge with the internet, are closely attached to the internet for innovation, are tightened to the internet financial strategic layout, and provide diversified online personal credit loan services for customers through self-service systems such as online banks, mobile phone banks and the like. Abundant credit loan products provide great convenience for personal loans, but different products have great differences in object orientation, used data and the like, and loan institutions need to make different wind control strategies according to business scenes. One difficulty of credit wind control is anti-fraud, multiple types of fraud exist, the fraud is difficult to singly label, even if historical fraud data can be used for supervised learning with labels, the learned fraud is only similar to the history, and the method is useless for new fraud types. Under the condition of multiple scenes, the fraud modes are different, and users without historical behaviors or with few behaviors, such as newly registered users and silent users, cannot be identified, so that the discovery of new fraud is delayed.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art:
under the condition of multiple scenes, the fraud modes are different, and users without historical behaviors or with few behaviors, such as newly registered users and silent users, cannot be identified, so that the discovery of new fraud is delayed.
Disclosure of Invention
In view of this, embodiments of the present application provide an anomaly detection method, an anomaly detection apparatus, an electronic device, and a computer-readable medium, which can solve the problem that in the existing multi-scenario situation, the fraud manner is very different, and the discovery of new fraud is delayed due to the fact that there is no historical behavior or users with few behaviors, such as a newly registered user and a silent user, cannot be identified.
To achieve the above object, according to an aspect of an embodiment of the present application, there is provided an abnormality detection method including:
receiving a user application request, and determining a corresponding scene identifier;
acquiring user scene data of a historical preset time period based on the scene identification;
calling a clustering model to determine each clustering cluster based on user scene data;
determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index;
and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster.
Optionally, before invoking the clustering model, the anomaly detection method further includes:
acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identification and comprise abnormal applications with time aggregation;
performing characteristic engineering processing on the user application data, and further screening clustering indexes based on a decision tree;
and updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model based on the updated decision tree.
Optionally, the screening the clustering index based on the decision tree includes:
and pre-classifying the user application data in all time periods by using the decision tree, and further determining a clustering index according to a pre-classification result.
Optionally, determining a clustering index according to the pre-classification result includes:
determining the abnormal sample proportion of each node in the decision tree;
and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
Optionally, determining abnormal cluster in each cluster based on the abnormal screening index includes:
determining the mean value and standard deviation of each abnormal screening index in each cluster;
and determining abnormal cluster in each cluster based on the mean value and the standard deviation.
Optionally, determining abnormal clusters in each cluster based on the mean and the standard deviation includes:
generating an abnormal reference value based on the mean value and the standard deviation;
and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
Optionally, the abnormality detecting method further includes:
and automatically updating the user scene data in the historical preset time period every day, and further automatically updating each cluster.
In addition, the present application also provides an abnormality detection apparatus, including:
the receiving unit is configured to receive a user application request and determine a corresponding scene identifier;
the acquisition unit is configured to acquire user scene data of a historical preset time period based on the scene identification;
a cluster determination unit configured to invoke a cluster model to determine clusters based on user scene data;
an abnormal cluster determining unit configured to determine an abnormal screening index, and further determine abnormal clusters in each cluster based on the abnormal screening index;
and the anomaly detection unit is configured to respond to the fact that the user scene data corresponding to the user application request is matched with the abnormal cluster, and intercept the user application request.
Optionally, the anomaly detection apparatus further comprises a training unit configured to:
acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identification and comprise abnormal applications with time aggregation;
performing characteristic engineering processing on the user application data, and further screening clustering indexes based on a decision tree;
and updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model based on the updated decision tree.
Optionally, the training unit is further configured to:
and pre-classifying the user application data in all time periods by using the decision tree, and further determining a clustering index according to a pre-classification result.
Optionally, the training unit is further configured to:
determining the abnormal sample proportion of each node in the decision tree;
and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
Optionally, the abnormal cluster determining unit is further configured to:
determining the mean value and standard deviation of each abnormal screening index in each cluster;
and determining abnormal cluster in each cluster based on the mean value and the standard deviation.
Optionally, the abnormal cluster determining unit is further configured to:
generating an abnormal reference value based on the mean value and the standard deviation;
and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
Optionally, the abnormality detection apparatus further includes an update unit configured to:
and automatically updating the user scene data in the historical preset time period every day, and further automatically updating each cluster.
In addition, the present application also provides an abnormality detection electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the anomaly detection method as described above.
In addition, the present application also provides a computer readable medium, on which a computer program is stored, which when executed by a processor implements the abnormality detection method as described above.
To achieve the above object, according to still another aspect of embodiments of the present application, there is provided a computer program product.
A computer program product according to an embodiment of the present application includes a computer program, and when the computer program is executed by a processor, the computer program implements the abnormality detection method according to the embodiment of the present application.
One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of determining a corresponding scene identifier by receiving a user application request; acquiring user scene data of a historical preset time period based on the scene identification; calling a clustering model to determine each clustering cluster based on user scene data; determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index; and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster. By identifying the similar mode of the users based on the scene data, the abnormal applications without actual association but with similar modes can be identified without actual association among the users, so that new fraudulent behaviors can be found in time, and property loss is effectively avoided.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a further understanding of the application and are not to be construed as limiting the application. Wherein:
fig. 1 is a schematic view of a main flow of an abnormality detection method according to a first embodiment of the present application;
fig. 2 is a schematic view of a main flow of an abnormality detection method according to a second embodiment of the present application;
fig. 3 is a schematic view of an application scenario of an anomaly detection method according to a third embodiment of the present application;
FIG. 4 is a schematic diagram of a model training process according to an embodiment of the present application;
fig. 5 is a schematic diagram of main units of an abnormality detection apparatus according to an embodiment of the present application;
FIG. 6 is an exemplary system architecture diagram to which embodiments of the present application may be applied;
fig. 7 is a schematic structural diagram of a computer system suitable for implementing the terminal device or the server according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations.
Fig. 1 is a schematic diagram of a main flow of an abnormality detection method according to a first embodiment of the present application, and as shown in fig. 1, the abnormality detection method includes:
step S101, receiving a user application request, and determining a corresponding scene identifier.
In this embodiment, an execution subject (for example, a server) of the anomaly detection method may receive a user application request through wired connection or wireless connection. The user application request may be a loan request. Of course, it is understood that the user application request may also be an obstacle identification application request generated by the call request generating unit based on the obstacle picture after the execution subject receives the obstacle picture photographed by the automatic driving device. The embodiment of the application does not limit the specific content of the user application request. The scene identifier may be an identifier corresponding to a service scene, or may also be an identifier corresponding to a driving road scene, and the specific content corresponding to the scene identifier is not limited in the embodiment of the present application. A scene identifier, which may be GJJ, for example, representing a public accumulation fund scene; and SX can also represent a trust scenario.
Specifically, the scene data refers to user data used for credit line calculation. For the trust authority, different data are adopted to trust the user according to the data holding or authorization condition of the user. For example, if some user authorization credit granting mechanism inquires the accumulated fund data of the user, the credit granting mechanism grants credit according to the accumulated fund payment condition of the user, at this time, the accumulated fund scene is called as a credit granting scene, and the accumulated fund data is the credit granting data; some user authorization mobile phone mechanisms inquire the tax payment data of the user authorization mobile phone mechanisms, and then the credit granting mechanism grants credit according to the tax payment condition of the user, at the moment, the tax payment scene is called as a credit granting scene, and the tax payment data is the credit granting data of the user authorization mobile phone mechanisms.
Step S102, based on the scene identification, user scene data of a historical preset time period is obtained.
For example, the execution subject may obtain n consecutive days of user context data before the current time based on the context identification.
Step S103, calling a clustering model to determine each clustering cluster based on the user scene data.
And inputting the acquired user scene data of the historical preset time period into a clustering model, calling a module which is subjected to unsupervised learning training in the clustering model, clustering the input user scene data, and determining each clustering cluster.
Specifically, the abnormality detection method further includes:
and automatically updating the user scene data in the historical preset time period every day, and further automatically updating each cluster.
Specifically, before invoking the clustering model, the method further comprises:
acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identifier and have time-aggregated (for example, some continuous n days) abnormal applications;
as shown in fig. 4, the execution subject performs feature engineering processing on the user application data (i.e., the extracted scene data), and further screens a clustering index based on a decision tree to perform clustering. Specifically, after extracting corresponding scene data, the execution subject needs to perform feature engineering including feature cleaning, feature derivation, and the like. And screening clustering indexes based on a decision tree, wherein the label of the decision tree is a marked fraud label, and the variable is the characteristic after characteristic engineering. And selecting the leaf nodes with the higher abnormal sample ratio, backtracking all the branch paths from the leaf nodes to the root node, and extracting the characteristics used by all the nodes on the paths. In an example, the characteristics of the personal payment base number, the account balance, the payment unit number, the longest continuous payment time of the public deposit and the payment stop time of the public deposit corresponding to all the nodes on the path are extracted for the subsequent clustering process. And normalizing the extracted features for clustering. Normalization means that the features are mapped into a numerical range with a small range in a unified manner, and the normalization of the features is performed in the embodiment of the present application, so as to mainly avoid the influence of different feature dimensions on the weight relationship between the features. Clustering is carried out by using a rolling method according to the day, and application samples of the previous n days are taken every day for clustering. n may take values between 1 and 7 depending on the amount of data and business experience. The larger n is, the higher the dependence on historical samples is, the larger the sample size is, and the delay of abnormal discovery is realized; the smaller n, the opposite. An enumeration method is used, and the unsupervised part is combined to determine n and the cluster number of the clusters. As shown in fig. 4, supervised learning uses organization data and credit data for early warning rule setting. The organization data and the credit investigation data are common data of the clients, are not influenced by scenes, and have universality. And then, performing characteristic engineering on the extracted organization data and credit investigation data, performing box separation and IV calculation on the data subjected to the characteristic engineering to screen monitoring indexes, setting an early warning threshold value, and finally performing abnormal class identification based on a clustering result obtained by unsupervised learning. The organization data may be, for example, data of X corporation or X bank, and the embodiment of the present application does not specifically limit the organization data.
And updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model obtained based on unsupervised learning based on the updated decision tree. As shown in fig. 4, the model training includes sample selection, classification using unsupervised learning and early warning rule setting using supervised learning, and finally, an abnormal class is identified.
Specifically, screening the clustering index based on the decision tree includes:
and pre-classifying the user application data in all time periods by using the decision tree, and further determining a clustering index according to a pre-classification result. The pre-classification may be a classification based on a preset classification index.
Specifically, determining a clustering index according to a pre-classification result includes:
determining the abnormal sample proportion of each node in the decision tree; and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
Specifically, the executing agent may select a leaf node with a higher proportion of abnormal samples in the decision tree, trace back all the branch paths to the root node, and extract features used by all the nodes in the path. And sorting the extracted abnormal sample proportion corresponding to each node, selecting the labels corresponding to the nodes n before ranking, and determining the labels as clustering indexes, so that clustering based on the clustering indexes is more accurate.
And step S104, determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index.
Wherein, determining the abnormal screening index comprises: and (2) carrying out feature engineering on the data of the training sample set, such as account information, historical transaction running water, historical loan information, credit investigation and the like associated with the user application time point (the feature engineering refers to a process of converting original data into training data of a model and aims at obtaining better training data features), and the feature engineering comprises feature cleaning, feature derivation and the like. The characteristic Information Value or volume (IV) and the characteristic Population Stability Index (PSI) are calculated. The higher the IV, the greater the association of a feature with whether or not fraud is occurring; the higher the PSI, the less the variables fluctuate in the time dimension. Screening out the characteristics of high IV and stability. And determining a single index (namely an abnormal screening index) for screening the abnormal cluster by using decision tree binning on the characteristics, and setting a threshold value for each abnormal screening index (namely the single index) to divide the abnormal interval of the index.
Identifying abnormal clustering: and (3) combining the clustering result with the abnormal screening index, counting the number of samples of each cluster for the clustered samples, removing small clusters (the number of samples in the clusters is less than 2% of the number of the whole samples), and calculating the sample proportion of the abnormal interval under each index by using each single index for each remaining cluster. And calculating the mean value and the standard deviation of the proportion of the abnormal interval samples of each single index on each cluster. And recording the class of the index value which is larger than the mean value +2 standard deviation. If m or more index values of one class are larger than the mean value +2 standard deviation, marking the class as an abnormal cluster. The embodiment of the present application does not limit the specific value of m.
Step S105, responding to the fact that the user scene data corresponding to the user application request is matched with the abnormal cluster, and intercepting the user application request.
For example, when the clustering model is applied, in combination with the class classification result of unsupervised learning, it can be determined whether the user scene data corresponding to the user application request falls into an abnormal clustering. And then calculating evaluation indexes including abnormal cluster sample proportion, proportion of real abnormal samples in the abnormal cluster to real abnormal samples in the whole day, proportion of real abnormal samples in the abnormal cluster to samples in the abnormal cluster, early warning days, early warning rate, false alarm rate and the like. And applying the clustering characteristics, the clustering parameters, the indexes and the threshold values to the model verification samples. The abnormal cluster type in the model validation sample and the model training sample may be different. The verification part identifies abnormal clustering clusters of the verified samples according to the trained clustering characteristics, clustering parameters, indexes and threshold values, and calculates evaluation indexes. And adjusting the clustering characteristics, the clustering parameters, the monitoring indexes and the monitoring index threshold values thereof according to the set early warning target by combining the evaluation indexes of the training set and the verification set.
The method comprises the steps of adjusting the number of days n ahead of the clustering day of model parameters, the number of clustering model clusters and the number of early warning indexes m through the performance of a model on samples such as training and verification, determining n, wherein the number of accurate early warning days and the number of false warning days are counted when n is 1 to 7, determining n corresponding to the most accurate early warning days and the least false warning days, counting the proportion of fraudulent samples in early warning classes by enumerating different clustering cluster numbers in each day of clustering, counting the proportion of all the fraudulent samples by the early-warned fraudulent samples, and selecting the number of clusters corresponding to the condition that the fraudulent samples are covered the most, the proportion of the fraudulent samples in the abnormal clustering is the most and the abnormal degree of the abnormal clustering monitoring indexes is the highest. The early warning index number m is mainly used for selecting the corresponding m when the number of accurate early warning days is the largest and the number of false warning days is the smallest by counting the number of accurate early warning and false warning days.
In the example, when the clustering model is applied, the users who apply for the same scene in the last n days are clustered every day according to the clustering characteristics and the clustering parameters determined during model training, and the abnormal clustering cluster numbers are determined according to the indexes and the threshold values determined during model training. If abnormal cluster clusters are captured in daily off-line training, considering that cheating users usually appear in a time set, and performing on-line real-time monitoring according to the result of the off-line model on the second day.
On the second day, calculating the distance between the loan association clustering characteristics of the corresponding scene of the new application and the mass centers of the loan association clustering characteristics on the previous clustering day in real time, and determining the class to which the loan belongs; and judging whether the related index features are abnormal clustering clusters in real time. And if the abnormal cluster is divided, intercepting a user application request, and directly sending the user application request to a manual approval node for manual examination and verification.
The embodiment of the application uses unsupervised learning, and classification can be automatically adjusted when abnormal modes change. And the loan applications with similar modes frequently initiated in a short period can be quickly identified by daily automatic iterative clustering. The method can identify the abnormal applications of the group with the sudden increase without depending on the historical application behaviors of the application samples and the actual correlation among the abnormal samples, and is also effective for new users.
The embodiment determines the corresponding scene identifier by receiving a user application request; acquiring user scene data of a historical preset time period based on the scene identification; calling a clustering model to determine each clustering cluster based on user scene data; determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index; and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster. By identifying the similar mode of the users based on the scene data, the abnormal applications without actual association but with similar modes can be identified without actual association among the users, so that new fraudulent behaviors can be found in time, and property loss is effectively avoided.
Fig. 2 is a schematic main flow chart of an abnormality detection method according to a second embodiment of the present application, and as shown in fig. 2, the abnormality detection method includes:
step S201, receiving a user application request, and determining a corresponding scene identifier.
Step S202, based on the scene identification, user scene data of a historical preset time period is obtained.
Step S203, calling a clustering model to determine each clustering cluster based on the user scene data.
And step S204, determining abnormal screening indexes, and determining the mean value and standard deviation of each abnormal screening index in each cluster.
And S205, determining abnormal cluster in each cluster based on the mean value and the standard deviation.
Specifically, determining abnormal cluster in each cluster based on the mean and the standard deviation includes:
generating an abnormal reference value based on the mean value and the standard deviation; and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
The method specifically comprises the following steps: and recording the class of the index value which is larger than the mean value +2 standard deviation. If m or more index values of one class are larger than the mean value +2 standard deviation, marking the class as an abnormal cluster.
Step S206, responding to the fact that the user scene data corresponding to the user application request is matched with the abnormal cluster, and intercepting the user application request.
And calculating the distance from the centroid of the cluster corresponding to the user scene data corresponding to the user application request to the centroid of each abnormal cluster, and taking the abnormal cluster in which the minimum distance is positioned as the classification of the user scene data corresponding to the user application request.
According to the method and the device, the similar mode of the user is identified based on the scene data, actual association among the users is not needed, and abnormal applications which are not actually associated but have similar modes can be identified.
Fig. 3 is a schematic view of an application scenario of an anomaly detection method according to a third embodiment of the present application. The anomaly detection method of the embodiment of the application can be applied to a scene that a user applies for loan. As shown in fig. 3, a server 302 receives a user application request 301 and determines a corresponding scene identifier 303. The server 302 obtains historical user scene data 304 that is historical for a preset time period based on the scene identification 303. Server 302 invokes clustering model 305 to determine clusters 306 (which may include cluster 1, cluster 2, …, cluster n) based on user context data 304. The server 302 determines an exception screening indicator 307, and further determines an exception cluster 308 of the clusters 306 based on the exception screening indicator 307. The server 302 intercepts the user application request 301 in response to determining that the user context data corresponding to the user application request 301 matches the abnormal cluster 308. In fig. 3, the process of determining each cluster 306 and the process of determining the abnormal screening index 307 may be performed simultaneously. After determining each cluster 306 and the exception screening indicator 307, the execution principal may determine an exception cluster 308 from the determined each cluster 306 and the exception screening indicator 307. Then, the user scene data 309 corresponding to the user application request is called to be matched with the abnormal cluster 308, when the matching is successful, the received user application request is close to the abnormal application mode, and the user application request is intercepted.
According to the method and the device, the similar mode of the user is identified based on the scene data, actual association among the users is not needed, and abnormal applications which are not actually associated but have similar modes can be identified. The anomaly detection method based on machine learning in the embodiment of the application combines supervised learning and unsupervised learning, distinguishes service scenes, quickly identifies and intercepts abnormal loan applications with similar modes frequently initiated in a short period, and sends clients of the abnormal loan applications and trigger modes thereof to manual auditing. According to the method and the device, the abnormal application can be identified through the similar modes of different users in the same time period, the historical behavior is not relied on, the similar modes of the users are identified based on the scene data, the actual association among the users is not needed, and the abnormal application without the actual association but with the similar modes can be identified.
According to different credit granting scenes, different clustering models are respectively established. Has the following advantages: and reducing the dimensionality of the clustered data. The clustering data only comprises data used by respective credit granting scenes; and the operation performance is improved. The lower the dimensionality is, the faster the calculation speed is; and reducing the number of clustering clusters. Most customers only have or only authorize single or a small amount of credit granting data, a plurality of types of scene data are used for one cluster, when the number of the clusters is too small, customers with certain type or types of scene data missing are clustered into one type without being subdivided, and the customers cannot be used for identifying abnormal customers; and the clustering precision is improved. Clustering is carried out in the same scene, and clients with abnormal scenes can be better identified.
In the embodiment of the application, supervised learning and unsupervised learning are as follows: supervised learning is training with labeled samples, and the association between features and labels is found through training. Unsupervised learning is the training of unlabeled samples, and the training reveals the intrinsic properties and regularity of the data. Clustering: clustering is a type of algorithm for unsupervised learning. Clustering divides samples in a data set into a plurality of clusters, so that the inter-cluster sample similarity is high and the inter-cluster sample similarity is low. The clustering algorithm comprises hierarchical clustering, a k-mean algorithm, density distance and the like. Time series analysis assumes that the occurrence of an anomaly is time dependent.
Fig. 5 is a schematic diagram of main units of an abnormality detection apparatus according to an embodiment of the present application. As shown in fig. 5, the abnormality detection apparatus includes a receiving unit 501, an acquisition unit 502, a cluster determination unit 503, an abnormal cluster determination unit 504, and an abnormality detection unit 505.
The receiving unit 501 is configured to receive a user application request and determine a corresponding scene identifier.
An obtaining unit 502 configured to obtain user scene data of a historical preset time period based on the scene identification.
A cluster determining unit 503 configured to invoke a cluster model to determine each cluster based on the user scene data.
An abnormal cluster determining unit 504 configured to determine an abnormal screening index, and further determine an abnormal cluster in each cluster based on the abnormal screening index.
And an anomaly detection unit 505 configured to intercept the user application request in response to determining that the user scene data corresponding to the user application request matches the abnormal cluster.
In some embodiments, the anomaly detection apparatus further comprises a training unit, not shown in fig. 5, configured to: acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identification and comprise abnormal applications with time aggregation; performing characteristic engineering processing on the user application data, and further screening clustering indexes based on a decision tree; and updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model based on the updated decision tree.
In some embodiments, the training unit is further configured to: and pre-classifying the user application data in all time periods by using the decision tree, and further determining a clustering index according to a pre-classification result.
In some embodiments, the training unit is further configured to: determining the abnormal sample proportion of each node in the decision tree; and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
In some embodiments, the anomalous cluster determination unit 504 is further configured to: determining the mean value and standard deviation of each abnormal screening index in each cluster; and determining abnormal cluster in each cluster based on the mean value and the standard deviation.
In some embodiments, the anomalous cluster determination unit 504 is further configured to: generating an abnormal reference value based on the mean value and the standard deviation; and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
In some embodiments, the anomaly detection apparatus further comprises an updating unit, not shown in fig. 5, configured to: and automatically updating the user scene data in the historical preset time period every day, and further automatically updating each cluster.
In the present application, the anomaly detection method and the anomaly detection apparatus have corresponding relation in the specific implementation contents, and therefore, the description of the repeated contents is omitted.
Fig. 6 shows an exemplary system architecture 600 to which the anomaly detection method or anomaly detection apparatus of the embodiments of the present application may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the terminal devices 601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The terminal devices 601, 602, 603 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 601, 602, 603 may be various electronic devices having an abnormality detection processing screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (for example only) providing support for user application requests submitted by users using the terminal devices 601, 602, 603. The background management server can receive a user application request and determine a corresponding scene identifier; acquiring user scene data of a historical preset time period based on the scene identification; calling a clustering model to determine each clustering cluster based on user scene data; determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index; and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster. By identifying the similar mode of the users based on the scene data, the abnormal applications without actual association but with similar modes can be identified without actual association among the users, so that new fraudulent behaviors can be found in time, and property loss is effectively avoided.
It should be noted that the abnormality detection method provided in the embodiment of the present application is generally executed by the server 605, and accordingly, the abnormality detection apparatus is generally disposed in the server 605.
It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing a terminal device of an embodiment of the present application. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the computer system 700 are also stored. The CPU701, the ROM702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a liquid crystal credit authorization query processor (LCD), and the like, and a speaker and the like; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments disclosed herein, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments disclosed herein include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a receiving unit, an obtaining unit, a cluster determining unit, an abnormal cluster determining unit, and an abnormal detecting unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs, and when the one or more programs are executed by the device, the device receives a user application request and determines a corresponding scene identifier; acquiring user scene data of a historical preset time period based on the scene identification; calling a clustering model to determine each clustering cluster based on user scene data; determining an abnormal screening index, and further determining abnormal cluster in each cluster based on the abnormal screening index; and intercepting the user application request in response to determining that the user scene data corresponding to the user application request is matched with the abnormal cluster.
The computer program product of the present application comprises a computer program which, when executed by a processor, implements the anomaly detection method in the embodiments of the present application.
According to the technical scheme of the embodiment of the application, the similar modes of the users are identified based on the scene data, actual association among the users is not needed, abnormal applications which are not actually associated but have similar modes can be identified, new fraudulent behaviors can be found in time, and property loss is effectively avoided.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. An abnormality detection method characterized by comprising:
receiving a user application request, and determining a corresponding scene identifier;
acquiring user scene data of a historical preset time period based on the scene identification;
calling a clustering model to determine each clustering cluster based on the user scene data;
determining an abnormal screening index, and further determining abnormal cluster clusters in each cluster based on the abnormal screening index;
and responding to the situation that the user scene data corresponding to the user application request is matched with the abnormal cluster, and intercepting the user application request.
2. The method of claim 1, wherein prior to said invoking the clustering model, the method further comprises:
acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identification and comprise abnormal applications with time aggregation;
performing characteristic engineering processing on the user application data, and further screening clustering indexes based on a decision tree;
and updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model based on the updated decision tree.
3. The method of claim 2, wherein the filtering clustering index based on the decision tree comprises:
and pre-classifying the user application data in all time periods by using a decision tree, and determining a clustering index according to a pre-classification result.
4. The method of claim 3, wherein determining a clustering index based on the pre-classification result comprises:
determining the abnormal sample proportion of each node in the decision tree;
and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
5. The method of claim 1, wherein said determining abnormal clusters of said clusters based on said abnormal screening metric comprises:
determining the mean value and standard deviation of each abnormal screening index in each cluster;
and determining abnormal cluster in each cluster based on the mean value and the standard deviation.
6. The method of claim 5, wherein said determining abnormal clusters of said clusters based on said mean and said standard deviation comprises:
generating an abnormal reference value based on the mean and the standard deviation;
and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
7. The method of claim 1, further comprising:
and automatically updating the user scene data of the historical preset time period every day, and further automatically updating each cluster.
8. An abnormality detection device characterized by comprising:
the receiving unit is configured to receive a user application request and determine a corresponding scene identifier;
an obtaining unit configured to obtain user scene data of a historical preset time period based on the scene identification;
a cluster determination unit configured to invoke a cluster model to determine clusters based on the user scene data;
an abnormal cluster determining unit configured to determine an abnormal screening index, and further determine an abnormal cluster in the clusters based on the abnormal screening index;
an anomaly detection unit configured to intercept the user application request in response to determining that user scene data corresponding to the user application request matches the anomalous cluster.
9. The apparatus of claim 8, wherein the anomaly detection apparatus further comprises a training unit configured to:
acquiring a training sample set, wherein the training sample set comprises user application data which correspond to the same scene identification and comprise abnormal applications with time aggregation;
performing characteristic engineering processing on the user application data, and further screening clustering indexes based on a decision tree;
and updating the nodes of the decision tree based on the clustering indexes obtained by screening, and further generating a clustering model based on the updated decision tree.
10. The apparatus of claim 9, wherein the training unit is further configured to:
and pre-classifying the user application data in all time periods by using a decision tree, and determining a clustering index according to a pre-classification result.
11. The apparatus of claim 10, wherein the training unit is further configured to:
determining the abnormal sample proportion of each node in the decision tree;
and sequencing the abnormal sample proportion of each node, selecting a preset number of nodes based on the sequencing, and determining labels corresponding to the preset number of nodes as clustering indexes.
12. The apparatus of claim 8, wherein the anomalous cluster determination unit is further configured to:
determining the mean value and standard deviation of each abnormal screening index in each cluster;
and determining abnormal cluster in each cluster based on the mean value and the standard deviation.
13. The apparatus of claim 12, wherein the anomalous cluster determination unit is further configured to:
generating an abnormal reference value based on the mean and the standard deviation;
and determining the cluster where the index value is larger than the abnormal reference value, and further determining the cluster as an abnormal cluster.
14. An abnormality detection electronic device characterized by comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
15. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1-7.
16. A computer program product comprising a computer program, characterized in that the computer program realizes the method according to any of claims 1-7 when executed by a processor.
CN202111498620.9A 2021-12-09 2021-12-09 Abnormity detection method and device, electronic equipment and computer readable medium Pending CN114186626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111498620.9A CN114186626A (en) 2021-12-09 2021-12-09 Abnormity detection method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111498620.9A CN114186626A (en) 2021-12-09 2021-12-09 Abnormity detection method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN114186626A true CN114186626A (en) 2022-03-15

Family

ID=80603989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111498620.9A Pending CN114186626A (en) 2021-12-09 2021-12-09 Abnormity detection method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN114186626A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708003A (en) * 2022-04-27 2022-07-05 西南交通大学 Abnormal data detection method, device and equipment and readable storage medium
CN115022083A (en) * 2022-07-12 2022-09-06 中国人民银行清算总中心 Abnormal delimitation method and device
CN115981910A (en) * 2023-03-20 2023-04-18 建信金融科技有限责任公司 Method, device, electronic equipment and computer readable medium for processing exception request
CN116150861A (en) * 2023-04-20 2023-05-23 陕西建工集团股份有限公司 Intelligent processing system for construction data of shear wall in high-intensity area
CN117221241A (en) * 2023-11-08 2023-12-12 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708003A (en) * 2022-04-27 2022-07-05 西南交通大学 Abnormal data detection method, device and equipment and readable storage medium
CN114708003B (en) * 2022-04-27 2023-11-10 西南交通大学 Abnormal data detection method, device, equipment and readable storage medium
CN115022083A (en) * 2022-07-12 2022-09-06 中国人民银行清算总中心 Abnormal delimitation method and device
CN115022083B (en) * 2022-07-12 2024-05-10 中国人民银行清算总中心 Abnormal delimitation method and device
CN115981910A (en) * 2023-03-20 2023-04-18 建信金融科技有限责任公司 Method, device, electronic equipment and computer readable medium for processing exception request
CN116150861A (en) * 2023-04-20 2023-05-23 陕西建工集团股份有限公司 Intelligent processing system for construction data of shear wall in high-intensity area
CN116150861B (en) * 2023-04-20 2023-07-18 陕西建工集团股份有限公司 Intelligent processing system for construction data of shear wall in high-intensity area
CN117221241A (en) * 2023-11-08 2023-12-12 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system
CN117221241B (en) * 2023-11-08 2024-01-26 杭州鸿世电器股份有限公司 Intelligent switch control process data transmission method and system

Similar Documents

Publication Publication Date Title
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CA3063580A1 (en) Classifier training method and apparatus, electronic device and computer readable medium
CN111507470A (en) Abnormal account identification method and device
US20220366488A1 (en) Transmitting proactive notifications based on machine learning model predictions
CN112561685B (en) Customer classification method and device
CN113627566A (en) Early warning method and device for phishing and computer equipment
CN109615389A (en) Electronic-payment transaction risk control method, device, server and storage medium
CN111598713B (en) Cluster recognition method and device based on similarity weight updating and electronic equipment
CN111798047A (en) Wind control prediction method and device, electronic equipment and storage medium
US20230419402A1 (en) Systems and methods of optimizing machine learning models for automated anomaly detection
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN112950359B (en) User identification method and device
CN114978877A (en) Exception handling method and device, electronic equipment and computer readable medium
CN113298121B (en) Message sending method and device based on multi-data source modeling and electronic equipment
CN111245815B (en) Data processing method and device, storage medium and electronic equipment
CN117196630A (en) Transaction risk prediction method, device, terminal equipment and storage medium
CN111429257A (en) Transaction monitoring method and device
CN114880369A (en) Risk credit granting method and system based on weak data technology
CN114189585A (en) Crank call abnormity detection method and device and computing equipment
CN112734352A (en) Document auditing method and device based on data dimensionality
CN117114858B (en) Collocation realization method of calculation checking formula based on averator expression
CN113837764B (en) Risk early warning method, risk early warning device, electronic equipment and storage medium
TWI657393B (en) Marketing customer group prediction system and method
CN117670503A (en) Service application data processing method, device and server
CN116911910A (en) Integrated model for predicting purchasing behavior of bank customer products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination