CN111833171B - Abnormal operation detection and model training method, device and readable storage medium - Google Patents

Abnormal operation detection and model training method, device and readable storage medium Download PDF

Info

Publication number
CN111833171B
CN111833171B CN202010151773.5A CN202010151773A CN111833171B CN 111833171 B CN111833171 B CN 111833171B CN 202010151773 A CN202010151773 A CN 202010151773A CN 111833171 B CN111833171 B CN 111833171B
Authority
CN
China
Prior art keywords
user
abnormal
data
node
abnormal operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010151773.5A
Other languages
Chinese (zh)
Other versions
CN111833171A (en
Inventor
刘博文
郭豪
李晨阳
蔡准
孙悦
郭晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Trusfort Technology Co ltd
Original Assignee
Beijing Trusfort Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Trusfort Technology Co ltd filed Critical Beijing Trusfort Technology Co ltd
Priority to CN202010151773.5A priority Critical patent/CN111833171B/en
Publication of CN111833171A publication Critical patent/CN111833171A/en
Application granted granted Critical
Publication of CN111833171B publication Critical patent/CN111833171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, a device and a readable storage medium for abnormal operation detection and model training, which comprises the steps of obtaining a user characteristic vector; inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result; calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result; and if the obtained evaluation score exceeds the threshold score, determining that the current user has abnormal operation behaviors. Therefore, the method and the device utilize the unsupervised model to identify the abnormal operation behavior of the operation characteristics of the user, have higher identification accuracy and further reduce the phenomenon of stealing, transferring and brushing.

Description

Abnormal operation detection and model training method, device and readable storage medium
Technical Field
The invention relates to the technical field of financial transaction risks, in particular to an abnormal operation detection and model training method, an abnormal operation detection and model training device and a readable storage medium.
Background
With the wide popularization of smart phones and the rise of mobile internet, great convenience is brought to the life of people, and users can complete traditional businesses such as transfer, remittance, payment and financing of online outlets on line through terminal equipment. In any scene, only one terminal is needed to experience any financial business and consumption scene at any time and any place, and the demand is rapidly and conveniently solved. But a large number of black-birth parties can gain profits by various illegal means while being convenient.
Aiming at the means of gaining profits by black-yielding teams, the current effective method is to judge whether the behavior of the user is a fraud behavior by a machine learning method, but at present, the machine learning cannot completely distinguish the fraud behavior from the normal behavior due to huge information data and uneven data quality of the user.
Disclosure of Invention
The embodiment of the invention provides a method and a device for abnormal operation detection and model training and a readable storage medium, which can be used for identifying abnormal operation behaviors, have higher identification accuracy and further reduce the phenomenon of stealing, transferring and brushing.
One aspect of the present invention provides an abnormal operation detection method, including: acquiring a user feature vector; inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result; calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result; and if the obtained evaluation score exceeds the threshold score, determining that the current user has abnormal operation behaviors.
In an embodiment, the obtaining the user feature vector includes: acquiring operation data of a user; generating corresponding user characteristics according to the acquired operation data; performing at least data homochemotaxis processing and dimensionless processing on the generated user characteristics; and coding the processed user characteristics by utilizing a one-hot coding mode to obtain a user characteristic vector.
In one embodiment, the user feature vector is a combined feature vector of: the system comprises a user basic characteristic vector, a user habit behavior characteristic vector and a user time behavior.
In one embodiment, the unsupervised model is an unsupervised model based on an resolution Forest algorithm and a K-means + + algorithm; inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result, wherein the prediction result comprises the following steps: respectively taking the user feature vector as the input of the resolution Forest algorithm and the K-means + + algorithm to respectively obtain a first abnormal value s1 and a second abnormal value s 2; correspondingly, the evaluation score for representing the abnormal operation behavior of the user is calculated according to the prediction result, and the evaluation score comprises the following steps: multiplying the obtained first abnormal value s1 by a corresponding first weight coefficient w1 to obtain an resolution Forest performance score; multiplying the obtained second abnormal value s2 by a corresponding second weight coefficient w2 to obtain a K-means + + performance score; determining an evaluation score according to the obtained resolution Forest performance score and the K-means + + performance score; wherein the first weight coefficient w1 and the second weight coefficient w2 are dynamic values, and satisfy w1+ w2 being 1.0.
In an embodiment, after determining that the current user has the abnormal operation behavior, the method further includes: intercepting an operation request of a current user; or storing all historical operation information of the current user into the blackening database.
In another aspect, the present invention provides an unsupervised model training method, including: acquiring operation data of a user; generating corresponding user characteristics according to the acquired operation data; screening the generated user characteristics; performing at least data homochemotaxis processing and dimensionless processing on the screened user characteristics; coding the processed user characteristics by utilizing a one-hot coding mode to obtain user characteristic vectors; the obtained user characteristic vector is used as the input of an unsupervised model to predict the abnormal operation behavior of the user, and a prediction result is obtained; and optimizing the unsupervised model according to the prediction result.
In an implementation manner, in the process of acquiring the operation data of the user, the method further includes: and processing the operation data to delete or fill abnormal data generated in the acquisition process.
In one embodiment, padding the abnormal data comprises: if the abnormal data is judged to be continuous data, filling abnormal data bits by using effective average values corresponding to the abnormal data; if the abnormal data is judged to be discrete data, filling abnormal data bits by using a valid mode corresponding to the abnormal data.
Another aspect of the present invention provides an abnormal operation detecting apparatus, the apparatus including: the characteristic vector acquisition module is used for acquiring a user characteristic vector; the prediction result acquisition module is used for inputting the acquired user characteristic vector into the unsupervised model to predict the abnormal operation behavior of the user to obtain a prediction result; the evaluation score calculation module is used for calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result; and the abnormal operation behavior determining module is used for determining that the current user has the abnormal operation behavior if the obtained evaluation score exceeds the threshold score.
In another aspect, the present invention provides an unsupervised model training device, including: the operation data acquisition module is used for acquiring operation data of a user; the characteristic generating module is used for generating corresponding user characteristics according to the acquired operation data; the characteristic screening module is used for screening the generated user characteristics; the characteristic processing module is used for at least performing data homochemotaxis processing and dimensionless processing on the screened user characteristics; the characteristic coding module is used for coding the processed user characteristics by utilizing a one-hot coding mode to obtain user characteristic vectors; the model training module is used for predicting the abnormal operation behavior of the user by taking the obtained user characteristic vector as the input of the unsupervised model to obtain a prediction result; and the model updating module optimizes the unsupervised model according to the prediction result.
Another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions which, when executed, perform any of the above-described abnormal operation detection methods.
In another aspect, the present invention provides a computer-readable storage medium comprising a set of computer-executable instructions, which when executed, perform any one of the above-described unsupervised model training methods.
In the embodiment of the invention, the unsupervised model is used for identifying the abnormal operation behavior of the operation characteristics of the user, so that the method has higher identification accuracy and further reduces the phenomenon of stealing, transferring and brushing.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of a method for detecting abnormal operation according to an embodiment of the present invention;
FIG. 2 is a diagram of a bank stealing-transferring and stealing-brushing model structure based on unsupervised learning according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an IsolutionTree in the abnormal operation detection method according to the embodiment of the present invention;
FIG. 4 is a schematic overall flow chart of an abnormal operation detection method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an implementation flow of an unsupervised model training method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a specific implementation of an unsupervised model training method according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of feature engineering in an unsupervised model training method according to an embodiment of the present invention;
FIG. 8 is a schematic structural diagram of an abnormal operation detection apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an unsupervised model training device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of a method for detecting abnormal operation according to an embodiment of the present invention;
as shown in fig. 1, an aspect of the present invention provides an abnormal operation detection method, including:
step 101, obtaining a user feature vector;
step 102, inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of a user to obtain a prediction result;
103, calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result;
and 104, if the obtained evaluation score exceeds a threshold score, determining that the current user has abnormal operation behaviors.
In this embodiment, the method can be applied to a banking system to detect whether a user has a behavior of stealing, transferring and embezzling.
Firstly, a user feature vector is obtained, wherein the user feature vector is a combined feature vector combined by a user basic feature vector, a user habit behavior feature vector and a user time behavior feature vector. The user basic behavior characteristics are personal information of a user when the user transacts related services and unique and unchangeable characteristics generated by the user through the silver behavior, such as related characteristics of the account number, the registration time, the registration place, the registration name, the user identity and the like of the user; the user habit behavior characteristics are characteristic attributes reflecting the behavior of the user every time the user operates by using an online bank or a mobile phone bank, such as characteristics of an IP address, an MAC address, an operating equipment serial number, an operating account number, a transaction remittance account, a transaction collection account and the like when the user conducts transaction; the third is a user time behavior feature, which is a feature generated by extracting features through time correlation and can reflect the behavior of the user in a certain short time, for example: the number of account transfers of the same user within 1 hour, the accumulated transfer amount of the user within 1 day, the number of logging-in IP of the same user within 1 day and the like.
Then inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result; then, calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result;
if the obtained evaluation score exceeds the threshold score, determining that the current user has abnormal operation behaviors, and if the obtained evaluation score does not exceed the threshold score, determining that the current user does not have the abnormal operation behaviors. The specific process is as follows: an expert rules engine is utilized to determine whether the resulting evaluation score exceeds a threshold score, wherein this threshold is typically set in advance by a business expert based on experience.
And if the obtained evaluation score exceeds the threshold score, determining that the current user has an abnormal operation behavior, and sending a return value representing the stealing-brushing operation to the banking system by using the expert rule engine, wherein in the embodiment, the return value corresponding to the stealing-brushing operation is 1.
If the obtained evaluation score does not exceed the threshold score, determining that the current user is a normal operation behavior, and sending a return value representing normal operation to the banking system by using the expert rule engine, wherein in the embodiment, the return value corresponding to normal operation is 0.
Therefore, the method and the device utilize the unsupervised model to identify the abnormal operation behavior of the operation characteristics of the user, have higher identification accuracy and further reduce the phenomenon of stealing, transferring and brushing.
In one embodiment, obtaining the user feature vector includes:
acquiring operation data of a user;
generating corresponding user characteristics according to the acquired operation data;
performing at least data homochemotaxis processing and dimensionless processing on the generated user characteristics;
and coding the processed user characteristics by utilizing a one-hot coding mode to obtain a user characteristic vector.
In this embodiment, operation data of a user is obtained, where the operation data includes the following aspects: a registration operation request, a login operation request, a transfer operation request, a remittance operation request, a payment operation request, and other operation requests.
And then generating corresponding user characteristics according to the acquired operation data. Specifically, all historical operation data of the current user or historical operation data in a short period of time are integrated, a characteristic with strong correlation to a final result is constructed from historical operation characteristics, for example, in transaction data analysis, characteristics of the number of transactions of the same user at different mac addresses in a period of time are constructed by using mac address data and time information data, and similarly, a large number of characteristics are generated.
Then, performing at least data homochemotaxis processing and dimensionless processing on the generated user characteristics; the data homochemotaxis processing mainly solves the problem of data with different properties, directly sums indexes with different properties and cannot correctly reflect the comprehensive results of different acting forces, and the change of the data properties of the inverse indexes needs to be considered firstly to ensure that all indexes are homochemotactic to the acting forces of the evaluation scheme so as to obtain the correct results. The data dimensionless process mainly addresses the comparability of data.
And then, coding the processed user characteristics by utilizing a one-hot coding mode to obtain a user characteristic vector. Specifically, one-hot coding is performed on each feature, and then the feature vectors obtained through the one-hot coding are combined to obtain the user feature vector.
Fig. 2 is a diagram of a bank stealing-transferring and stealing-brushing model structure based on unsupervised learning according to an embodiment of the invention.
As shown in FIG. 2, in one possible embodiment, the unsupervised model is an unsupervised model based on the resolution Forest algorithm and the K-means + + algorithm;
inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result, wherein the prediction result comprises the following steps:
respectively taking the user feature vector as the input of an resolution Forest algorithm and a K-means + + algorithm to respectively obtain a first abnormal value s1 and a second abnormal value s 2;
correspondingly, the evaluation score for representing the abnormal operation behavior of the user is calculated according to the prediction result, and the evaluation score comprises the following steps:
multiplying the obtained first abnormal value s1 by a corresponding first weight coefficient w1 to obtain an resolution Forest performance score;
multiplying the obtained second abnormal value s2 by a corresponding second weight coefficient w2 to obtain a K-means + + performance score;
determining an evaluation score according to the obtained resolution Forest performance score and the K-means + + performance score;
the first weight coefficient w1 and the second weight coefficient w2 are dynamic values, and satisfy w1+ w2 being 1.0.
In this embodiment, the unsupervised model is an unsupervised model based on an resolution Forest algorithm and a K-means + + algorithm. The solution Forest algorithm and the K-means + + algorithm have the functions of anomaly detection and cluster analysis under an unsupervised condition, and the anomaly data and the normal data in the data characteristics can be divided by using the two algorithms, and the evaluation scores corresponding to the user characteristic vectors are output.
The following is a detailed description of resolution Forest:
an resolution Forest is composed of a plurality of resolution trees (itre), and the structure is shown in fig. 3, in which a1, a2, at layer 0, an data is randomly extracted from the original data. The construction process of the ITree needs to randomly select a feature q and a segmentation value p thereof, and recursively segment root node data until any condition of three points is met: (1) the tree has reached a limited height l; (2) there is only one sample on a node; (3) all features of the samples on the nodes are the same.
The formula for calculating the limit height l is:
l=log2n (where n is the total number of samples to construct ITree)
Randomly drawing t times from the original data, and generating t itres according to the method, where the t trees are called resolution Forest (IForest) as a whole. After generating IForest, the Anomaly Score (Anomaly Score) of each sample in all data needs to be calculated according to the following formula:
Figure BDA0002402701130000081
h (x) is the height of x in each tree,
Figure BDA0002402701130000082
for a given number of samples
Figure BDA0002402701130000083
The average value of the time-path lengths is used to normalize the path length h (x) of the sample x. The calculation formula is as follows:
Figure BDA0002402701130000084
where H (i) is a harmonic number, which may be estimated as ln (i) + 0.5772156649.
The following is a detailed description of K-means + +:
the improvement of K-means clustering algorithm by K-means + + comprises the following basic steps:
1. firstly, randomly selecting a sample from original data as an initial cluster center c 1;
2. calculating the shortest distance between each sample of the data and the current cluster center (namely the distance from the nearest cluster center), and expressing the shortest distance by D (x);
3. calculate the probability of each sample being selected as the next cluster center as
Figure BDA0002402701130000091
Selecting the next clustering center according to a rotation method;
4. repeating step 2 until k cluster centers are selected.
The selection of the number (K value) of the clustering centers can be selected in two ways, the first is an Elbow Method, namely, the K value at the corner of a straight line is selected as the optimal number of the clustering centers by drawing a relation graph of a K-means cost function and the clustering number K; the second is to combine the actual selection of the business by means of expert or manual observation.
After all data are clustered through K-means + +, all data are classified into different clusters, except the number of data in the clusters. As normal data in the data accounts for most, namely the cluster with a large number after corresponding clustering; the abnormal data occupies a small number and corresponds to a small number of clusters. Therefore, a threshold value N is given, if a certain cluster type data point is less than the threshold value N, all the points in the cluster are regarded as abnormal nodes; in contrast, a cluster of class greater than N is considered a normal cluster class, and all data points in the cluster are normal nodes.
Calculating the distance attribute ds of all nodes, wherein ds of all normal nodes is 0; and the ds value of the abnormal node is the sum of the distances between the node and the normal cluster center node. After the attribute of all the nodes ds is calculated, all the nodes are equally divided into buckets according to the distance ds, and corresponding abnormal scores s2 are given according to the actual service condition.
Figure BDA0002402701130000092
Thus, the specific process of step 102 is:
and respectively taking the user feature vector as the input of an resolution Forest algorithm and a K-means + + algorithm to respectively obtain a first abnormal value s1 and a second abnormal value s 2.
Correspondingly, the specific process of step 103 is:
multiplying the obtained first abnormal value s1 by a corresponding first weight coefficient w1 to obtain an resolution Forest performance score, namely w1 × s 1;
multiplying the obtained second abnormal value s2 by a corresponding second weight coefficient w2 to obtain a K-means + + performance score, namely w1 × s 2;
determining an evaluation score according to the obtained resolution Forest performance score and the K-means + + performance score; in this example, the evaluation score, i.e., v ═ w1 × s1+ w1 × s2, was obtained by adding the obtained solution Forest performance score and the K-means + + performance score. Of course, the evaluation score may be obtained by another algorithm such as multiplication or subtraction of the two.
The first weight coefficient w1 and the second weight coefficient w2 are dynamic values, and satisfy w1+ w2 being 1.0, and the first weight coefficient w1 and the second weight coefficient w2 can be adjusted according to the corresponding actual production condition of the system, so that the evaluation score has real-time effectiveness and correctness.
In an implementation manner, after determining that the abnormal operation behavior exists in the current user, the method further includes:
intercepting an operation request of a current user;
or storing all historical operation information of the current user into the blackening database.
In this embodiment, after it is determined that the current user has an abnormal operation behavior, the banking system is instructed to initiate an interception instruction for the operation request of the user, or the interception information and all historical operation information, such as ip, mac, and the like, related to the user, recorded inside the mobile banking system or the internet bank are written into a blackout database of the corresponding electronic banking as data deposits.
And further, if the user normally operates, releasing the corresponding operation, completing the business operation, and returning the result to the user.
Fig. 4 is a schematic overall implementation flow diagram of an abnormal operation detection method according to an embodiment of the present invention.
As shown in fig. 4, firstly, the user operates his own account through internet banking or other methods, and the banking system receives the operation data of the user and imports the operation data into the abnormal operation detection system.
The abnormal operation detection system performs an evaluation data, i.e., an evaluation score, with respect to the operation data of the user, and then transmits the evaluation data to the expert rule engine.
And the expert rule engine judges whether the operation of the user is an abnormal operation behavior according to the evaluation score and returns the judgment result to the banking system.
If the expert rule engine judges that the user is in normal operation, the banking system processes the operation of the user, and if the expert rule engine judges that the user is in stealing and embezzling operation, the banking system intercepts the operation of the user.
FIG. 5 is a schematic diagram of an implementation flow of an unsupervised model training method according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a specific implementation of an unsupervised model training method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of feature engineering in an unsupervised model training method according to an embodiment of the present invention.
As shown in fig. 5 and fig. 6, another aspect of the present invention provides an unsupervised model training method, including:
step 201, acquiring operation data of a user;
step 202, generating corresponding user characteristics according to the acquired operation data;
step 203, screening the generated user characteristics;
step 204, performing at least data homochemotaxis processing and dimensionless processing on the screened user characteristics;
step 205, encoding the processed user features by using a one-hot encoding mode to obtain user feature vectors;
step 206, the obtained user characteristic vector is used as the input of an unsupervised model to predict the abnormal operation behavior of the user, and a prediction result is obtained;
and step 207, optimizing the unsupervised model according to the prediction result.
In this embodiment, the training of the model may be performed periodically through a time switch, the time switch is used to start the training of the model when the training data reaches a certain amount, and the unsupervised model training process specifically includes:
and acquiring operation data of a user, wherein the operation data is used for training data of a training model, and the source of the operation data can be from an online or offline label. The operation data comprises the following aspects: a registration operation request, a login operation request, a transfer operation request, a remittance operation request, a payment operation request, and other operation requests.
Then, a feature engineering step is performed, as shown in fig. 7, to generate corresponding user features according to the acquired operation data. Specifically, all historical operation data of the current user or historical operation data in a short period of time are integrated, a characteristic with strong correlation to a final result is constructed from historical operation characteristics, for example, in transaction data analysis, characteristics of the number of transactions of the same user at different mac addresses in a period of time are constructed by using mac address data and time information data, and similarly, a large number of characteristics are generated.
Feature filtering is then performed on the generated user features. For a large number of features, selecting appropriate features will facilitate modeling. For the redundant feature, it does not give any information available to the pattern band, such as: all data have certain characteristics that are the same value or transfer characteristics when modeled for consumption, and such characteristics will be deleted. For the relevant features, the modeling effect is greatly influenced, but the correlation with other features is similar to the reuse of data, in this case, the screening can be carried out by utilizing the Pearson correlation coefficient, which is the simplest method capable of helping understanding the relationship between the features and the response variables, the linear correlation between the variables is measured, and the relevant features are deleted by the method.
And then performing at least data homochemotaxis processing and dimensionless processing on the screened user characteristics. The data homochemotaxis processing mainly solves the problem of data with different properties, directly sums indexes with different properties and cannot correctly reflect the comprehensive results of different acting forces, and the change of the data properties of the inverse indexes needs to be considered firstly to ensure that all indexes are homochemotactic to the acting forces of the evaluation scheme so as to obtain the correct results. The data dimensionless process mainly addresses the comparability of data.
And then, coding the processed user characteristics by utilizing a one-hot coding mode to obtain a user characteristic vector. Specifically, one-hot coding is performed on each feature, and then the feature vectors obtained through the one-hot coding are combined to obtain the user feature vector.
Then, training the obtained user characteristic vector as the input of an unsupervised model to obtain a prediction result;
and finally, updating the weight parameters in the unsupervised model by the existing model updating means according to the prediction result.
In an implementation manner, in the process of acquiring the operation data of the user, the method further includes:
and processing the operation data to delete or fill abnormal data generated in the acquisition process.
In this embodiment, in the process of acquiring the operation data of the user, a situation of data missing or data error may occur, and these abnormal data are not favorable for subsequent modeling. Therefore, the operation data needs to be processed to delete the erroneous data generated in the acquisition process and to fill up the missing data.
By processing the operation data, the subsequent modeling deviation can be controlled in a smaller range.
In one embodiment, the method further comprises, for exceptional data padding:
if the abnormal data is judged to be continuous data, filling the abnormal data bits by using the effective average value of the corresponding abnormal data;
if the abnormal data is judged to be discrete data, the effective mode corresponding to the abnormal data is used for filling the abnormal data bit.
In this embodiment, the specific process of filling the abnormal data is as follows: one principle is generally followed: if the abnormal data is judged to be measurable continuous data, calculating to obtain an effective average value through the existing effective data, and filling up missing data by using the effective average value; and if the abnormal data is judged to be unmeasurable discrete data, calculating to obtain a mode through the existing valid data, and filling up the missing data by using the valid mode.
Fig. 8 is a schematic structural diagram of an abnormal operation detection apparatus according to an embodiment of the present invention.
As shown in fig. 8, another aspect of the present invention provides an abnormal operation detecting apparatus, including:
a feature vector obtaining module 301, configured to obtain a user feature vector;
a prediction result obtaining module 302, configured to input the obtained user feature vector into an unsupervised model to perform prediction of abnormal operation behavior of the user, so as to obtain a prediction result;
an evaluation score calculation module 303, configured to calculate an evaluation score for representing an abnormal operation behavior of the user according to the prediction result;
an abnormal operation behavior determining module 304, configured to determine that the current user has an abnormal operation behavior if the obtained evaluation score exceeds the threshold score.
In this embodiment, a feature vector acquisition module 301 is first used to acquire a user feature vector, where the user feature vector is a combined feature vector formed by combining a user basic feature vector, a user habit behavior feature vector, and a user time behavior feature vector. The user basic behavior characteristics are personal information of a user when the user transacts related services and unique and unchangeable characteristics generated by the user through the silver behavior, such as related characteristics of the account number, the registration time, the registration place, the registration name, the user identity and the like of the user; the user habit behavior characteristics are characteristic attributes reflecting the behavior of the user every time the user operates by using an online bank or a mobile phone bank, such as characteristics of an IP address, an MAC address, an operating equipment serial number, an operating account number, a transaction remittance account, a transaction collection account and the like when the user conducts transaction; the third is a user time behavior feature, which is a feature generated by extracting features through time correlation and can reflect the behavior of the user in a certain short time, for example: the number of account transfers of the same user within 1 hour, the accumulated transfer amount of the user within 1 day, the number of logging-in IP of the same user within 1 day and the like.
Then, the obtained user characteristic vector is input into an unsupervised model through a prediction result obtaining module 302 to predict the abnormal operation behavior of the user, so as to obtain a prediction result; then, an evaluation score for representing the abnormal operation behavior of the user is calculated according to the prediction result through an evaluation score calculating module 303;
finally, if the obtained evaluation score exceeds the threshold score, the abnormal operation behavior determination module 304 determines that the current user has the abnormal operation behavior, and conversely, if the obtained evaluation score does not exceed the threshold score, the current user does not have the abnormal operation behavior. The specific process is as follows: an expert rules engine is utilized to determine whether the resulting evaluation score exceeds a threshold score, wherein this threshold is typically set in advance by a business expert based on experience.
And if the obtained evaluation score exceeds the threshold score, determining that the current user has an abnormal operation behavior, and sending a return value representing the stealing-brushing operation to the banking system by using the expert rule engine, wherein in the embodiment, the return value corresponding to the stealing-brushing operation is 1.
If the obtained evaluation score does not exceed the threshold score, determining that the current user is a normal operation behavior, and sending a return value representing normal operation to the banking system by using the expert rule engine, wherein in the embodiment, the return value corresponding to normal operation is 0.
Therefore, the method and the device utilize the unsupervised model to identify the abnormal operation behavior of the operation characteristics of the user, have higher identification accuracy and further reduce the phenomenon of stealing, transferring and brushing.
Fig. 9 is a schematic structural diagram of an unsupervised model training device according to an embodiment of the present invention.
As shown in fig. 9, another aspect of the present invention provides an unsupervised model training apparatus, comprising:
an operation data obtaining module 401, configured to obtain operation data of a user;
a feature generation module 402, configured to generate a corresponding user feature according to the acquired operation data;
a feature screening module 403, configured to screen the generated user features;
a feature processing module 404, configured to perform at least data homochemotaxis processing and dimensionless processing on the filtered user features;
and the feature encoding module 405 is configured to encode the processed user features in a one-hot encoding manner to obtain a user feature vector.
The model training module 406 is used for predicting abnormal operation behaviors of the user by taking the obtained user characteristic vector as the input of the unsupervised model to obtain a prediction result;
and the model updating module 407 is configured to optimize the unsupervised model according to the prediction result.
In this embodiment, the training of the model may be performed periodically through a time switch, the time switch is used to start the training of the model when the training data reaches a certain amount, and the unsupervised model training process specifically includes:
firstly, the operation data of the user is obtained through the operation data obtaining module 401, the operation data is used for training data of a training model, and the source of the operation data can be from an online source or an offline source. The operation data comprises the following aspects: a registration operation request, a login operation request, a transfer operation request, a remittance operation request, a payment operation request, and other operation requests.
Then, the process proceeds to the feature engineering step, as shown in fig. 7, and generates corresponding user features according to the operation data obtained by the feature generation module 402. Specifically, all historical operation data of the current user or historical operation data in a short period of time are integrated, a characteristic with strong correlation to a final result is constructed from historical operation characteristics, for example, in transaction data analysis, characteristics of the number of transactions of the same user at different mac addresses in a period of time are constructed by using mac address data and time information data, and similarly, a large number of characteristics are generated.
The generated user features are then feature filtered by feature filtering module 403. For a large number of features, selecting appropriate features will facilitate modeling. For the redundant feature, it does not give any information available to the pattern band, such as: all data have certain characteristics that are the same value or transfer characteristics when modeled for consumption, and such characteristics will be deleted. For the relevant features, the modeling effect is greatly influenced, but the correlation with other features is similar to the reuse of data, in this case, the screening can be carried out by utilizing the Pearson correlation coefficient, which is the simplest method capable of helping understanding the relationship between the features and the response variables, the linear correlation between the variables is measured, and the relevant features are deleted by the method.
The filtered user features are then processed by the feature processing module 404 for at least data homochemotaxis and dimensionless processing. The data homochemotaxis processing mainly solves the problem of data with different properties, directly sums indexes with different properties and cannot correctly reflect the comprehensive results of different acting forces, and the change of the data properties of the inverse indexes needs to be considered firstly to ensure that all indexes are homochemotactic to the acting forces of the evaluation scheme so as to obtain the correct results. The data dimensionless process mainly addresses the comparability of data.
And then, the feature encoding module 405 encodes the processed user features in a one-hot encoding mode to obtain user feature vectors. Specifically, one-hot coding is performed on each feature, and then the feature vectors obtained through the one-hot coding are combined to obtain the user feature vector.
Then, training the obtained user characteristic vector as the input of an unsupervised model through a model training module 406 to obtain a prediction result;
finally, the weight parameters in the unsupervised model are updated by the model updating module 407 according to the prediction result by the existing model updating means.
Based on the above-provided abnormal operation detection method, another aspect of the present invention provides a computer-readable storage medium, which includes a set of computer-executable instructions, when executed, for performing any one of the above-mentioned abnormal operation detection methods.
In an embodiment of the present invention, a computer-readable storage medium comprises a set of computer-executable instructions that, when executed, are configured to obtain a user feature vector; inputting the obtained user characteristic vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result; calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result; and if the obtained evaluation score exceeds the threshold score, determining that the current user has abnormal operation behaviors.
Therefore, the method and the device utilize the unsupervised model to identify the abnormal operation behavior of the operation characteristics of the user, have higher identification accuracy and further reduce the phenomenon of stealing, transferring and brushing.
Based on the above-provided unsupervised model training method, another aspect of the invention provides a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform any of the above-described unsupervised model training methods.
In an embodiment of the present invention, a computer-readable storage medium includes a set of computer-executable instructions, which when executed, are configured to obtain user operation data; generating corresponding user characteristics according to the acquired operation data; screening the generated user characteristics; performing at least data homochemotaxis processing and dimensionless processing on the screened user characteristics; coding the processed user characteristics by utilizing a one-hot coding mode to obtain user characteristic vectors; the obtained user characteristic vector is used as the input of an unsupervised model to predict the abnormal operation behavior of the user, and a prediction result is obtained; and optimizing the unsupervised model according to the prediction result.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. An abnormal operation detection method, characterized in that the method comprises:
acquiring a user feature vector;
inputting the obtained user feature vector into an unsupervised model to predict abnormal operation behaviors of the user to obtain a prediction result, wherein the unsupervised model is based on an resolution Forest algorithm and a K-means + + algorithm and comprises the following steps:
respectively taking the user feature vector as the input of the resolution Forest algorithm and the K-means + + algorithm to respectively obtain a first abnormal value s1 and a second abnormal value s2, wherein the second abnormal value s2 is the distance between a node and a normal cluster center node, the distance between the second abnormal value s2 and the normal cluster center node is 0 when the node is a normal node, the distance between the second abnormal value s2 and the normal cluster center node is the sum of the distances between the node and the normal cluster center node when the node is an abnormal node, and the number of the nodes in the normal cluster is greater than a threshold in the K-means + + algorithm;
calculating an evaluation score for representing the abnormal operation behavior of the user according to the prediction result, wherein the evaluation score comprises the following steps:
multiplying the obtained first abnormal value s1 by a corresponding first weight coefficient w1 to obtain an resolution Forest performance score; multiplying the obtained second abnormal value s2 by a corresponding second weight coefficient w2 to obtain a K-means + + performance score; determining an evaluation score according to the obtained resolution Forest performance score and the obtained K-means + + performance score, wherein the first weight coefficient w1 and the second weight coefficient w2 are dynamic values and satisfy w1+ w2 being 1.0;
and if the obtained evaluation score exceeds the threshold score, determining that the current user has abnormal operation behaviors.
2. The method of claim 1, wherein the obtaining the user feature vector comprises:
acquiring operation data of a user;
generating corresponding user characteristics according to the acquired operation data;
performing at least data homochemotaxis processing and dimensionless processing on the generated user characteristics;
and coding the processed user characteristics by utilizing a one-hot coding mode to obtain a user characteristic vector.
3. The method of claim 1, wherein the user feature vector is a combined feature vector of: the system comprises a user basic characteristic vector, a user habit behavior characteristic vector and a user time behavior.
4. The method of any of claims 1 to 3, wherein after determining that the current user has abnormal operating behavior, the method further comprises:
intercepting an operation request of a current user;
or storing all historical operation information of the current user into the blackening database.
5. An abnormal operation detection apparatus, characterized in that the apparatus comprises:
the characteristic vector acquisition module is used for acquiring a user characteristic vector;
the prediction result obtaining module is used for inputting the obtained user feature vector into an unsupervised model to predict the abnormal operation behavior of the user to obtain a prediction result, wherein the unsupervised model is an unsupervised model based on an resolution Forest algorithm and a K-means + + algorithm, and is specifically used for:
respectively taking the user feature vector as the input of the resolution Forest algorithm and the K-means + + algorithm to respectively obtain a first abnormal value s1 and a second abnormal value s2, wherein the second abnormal value s2 is the distance between a node and a normal cluster center node, the distance between the second abnormal value s2 and the normal cluster center node is 0 when the node is a normal node, the distance between the second abnormal value s2 and the normal cluster center node is the sum of the distances between the node and the normal cluster center node when the node is an abnormal node, and the number of the nodes in the normal cluster is greater than a threshold in the K-means + + algorithm;
an evaluation score calculation module, configured to calculate, according to the prediction result, an evaluation score for characterizing an abnormal operation behavior of the user, and specifically configured to:
multiplying the obtained first abnormal value s1 by a corresponding first weight coefficient w1 to obtain an resolution Forest performance score; multiplying the obtained second abnormal value s2 by a corresponding second weight coefficient w2 to obtain a K-means + + performance score; determining an evaluation score according to the obtained resolution Forest performance score and the obtained K-means + + performance score, wherein the first weight coefficient w1 and the second weight coefficient w2 are dynamic values and satisfy w1+ w2 being 1.0;
and the abnormal operation behavior determining module is used for determining that the current user has the abnormal operation behavior if the obtained evaluation score exceeds the threshold score.
6. A computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the abnormal operation detection method of any one of claims 1 to 4.
CN202010151773.5A 2020-03-06 2020-03-06 Abnormal operation detection and model training method, device and readable storage medium Active CN111833171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010151773.5A CN111833171B (en) 2020-03-06 2020-03-06 Abnormal operation detection and model training method, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010151773.5A CN111833171B (en) 2020-03-06 2020-03-06 Abnormal operation detection and model training method, device and readable storage medium

Publications (2)

Publication Number Publication Date
CN111833171A CN111833171A (en) 2020-10-27
CN111833171B true CN111833171B (en) 2021-06-25

Family

ID=72913502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010151773.5A Active CN111833171B (en) 2020-03-06 2020-03-06 Abnormal operation detection and model training method, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN111833171B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104602183A (en) * 2014-04-22 2015-05-06 腾讯科技(深圳)有限公司 Group positioning method and system
CN106095761A (en) * 2015-12-16 2016-11-09 段云涛 A kind of multiple criteria decision make method and device
CN106101116B (en) * 2016-06-29 2019-01-08 东北大学 A kind of user behavior abnormality detection system and method based on principal component analysis
CN107239789A (en) * 2017-05-09 2017-10-10 浙江大学 A kind of industrial Fault Classification of the unbalanced data based on k means
CN107169534A (en) * 2017-07-04 2017-09-15 北京京东尚科信息技术有限公司 Model training method and device, storage medium, electronic equipment
CN107918779A (en) * 2017-08-02 2018-04-17 北京国电通网络技术有限公司 One kind builds polynary load characteristics clustering model method and system
CN109063769B (en) * 2018-08-01 2021-04-09 济南大学 Clustering method, system and medium for automatically determining cluster number based on coefficient of variation
CN109919781A (en) * 2019-01-24 2019-06-21 平安科技(深圳)有限公司 Case recognition methods, electronic device and computer readable storage medium are cheated by clique
CN109948728A (en) * 2019-03-28 2019-06-28 第四范式(北京)技术有限公司 The method and apparatus of the training of abnormal transaction detection model and abnormal transaction detection
CN110414555B (en) * 2019-06-20 2023-10-03 创新先进技术有限公司 Method and device for detecting abnormal sample

Also Published As

Publication number Publication date
CN111833171A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN110009174B (en) Risk recognition model training method and device and server
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN111614690B (en) Abnormal behavior detection method and device
CN108510280B (en) Financial fraud behavior prediction method based on mobile equipment behavior data
CN111222976B (en) Risk prediction method and device based on network map data of two parties and electronic equipment
CN112132233A (en) Criminal personnel dangerous behavior prediction method and system based on effective influence factors
CN111325619A (en) Credit card fraud detection model updating method and device based on joint learning
CN110287292B (en) Judgment criminal measuring deviation degree prediction method and device
CN112488716B (en) Abnormal event detection system
CN104321794A (en) A system and method using multi-dimensional rating to determine an entity's future commercial viability
CN109829721B (en) Online transaction multi-subject behavior modeling method based on heterogeneous network characterization learning
CN110555148B (en) User behavior evaluation method, computing device and storage medium
CN110866832A (en) Risk control method, system, storage medium and computing device
CN114782161A (en) Method, device, storage medium and electronic device for identifying risky users
CN115718826A (en) Method, system, device and medium for classifying target nodes in graph structure data
CN111611519A (en) Method and device for detecting personal abnormal behaviors
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN117235608B (en) Risk detection method, risk detection device, electronic equipment and storage medium
CN113343123B (en) Training method and detection method for generating confrontation multiple relation graph network
CN113538126A (en) Fraud risk prediction method and device based on GCN
CN112966728A (en) Transaction monitoring method and device
CN112435078A (en) Method for classifying loyalty of users
CN111833171B (en) Abnormal operation detection and model training method, device and readable storage medium
CN116245645A (en) Financial crime partner detection method based on graph neural network
CN108629506A (en) Modeling method, device, computer equipment and the storage medium of air control model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant