CN115409518A - User transaction risk early warning method and device - Google Patents

User transaction risk early warning method and device Download PDF

Info

Publication number
CN115409518A
CN115409518A CN202211225975.5A CN202211225975A CN115409518A CN 115409518 A CN115409518 A CN 115409518A CN 202211225975 A CN202211225975 A CN 202211225975A CN 115409518 A CN115409518 A CN 115409518A
Authority
CN
China
Prior art keywords
transaction
risk
abnormal
payment
early warning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211225975.5A
Other languages
Chinese (zh)
Inventor
李维志
罗伟
胡兴源
陈立宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211225975.5A priority Critical patent/CN115409518A/en
Publication of CN115409518A publication Critical patent/CN115409518A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a user transaction risk early warning method and a device, relates to the field of data security, and can be applied to the financial field and other fields, and the method comprises the following steps: acquiring payment information of a user, and analyzing and acquiring a corresponding transaction type according to the payment information; analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value; comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result; and generating early warning information according to the first comparison result and the second comparison result.

Description

User transaction risk early warning method and device
Technical Field
The application relates to the field of data security, can be applied to the financial field and other fields, and particularly relates to a user transaction risk early warning method and device.
Background
With the widespread popularization of the electronic mobile payment mode in the public, the daily consumption and fund transfer of people become simple and quick, but at the same time, a way is opened for criminal activities. The fraud phenomena such as user information loss, network fraud and the like are in a wide range, criminals compile false information through the modes of telephone, network, short message and the like, set up a fraud bureau, carry out remote and non-contact fraud on users, induce the users to carry out criminal behaviors such as money payment for criminal branches, account transfer and the like, and cause the users to lose a large amount of funds. Criminals have been updated with techniques to create various false facts to cheat, and people who do not contact fraud information are very easy to cheat. The development of internet computing at present causes difficulty in finding out and fighting the lost behavior of network user information, and criminal branches depend on technical means and have strong anti-reconnaissance capability. Fraudsters often cheat with some anonymous, imposter or public telephone and the information left is also of a level-by-level design from which it is difficult to trace specific clues. The victims with lost user information can be distributed in all levels, all industries and all groups of the society, and users with insufficient fund risk sensitivity are easy to generate excessive consumption or large cash transfer behaviors of credit deceives, so that property loss is caused. The criminal activities have universality, the vicious population is large, and the whole society is seriously harmed; on the other hand, the amount of fraud is large, which means tens of millions or even tens of millions, which seriously disturbs the social order and causes serious harm to the society. At present, an effective early warning measure is lacked, so that a user can be warned aiming at abnormal fund transactions, and the lost information of the user information can be quickly and effectively identified.
Disclosure of Invention
The application aims to provide a user transaction risk early warning method and device, which are used for respectively predicting through isolated forest and random forest algorithms based on payment types of users, effectively judging the risk condition of the users according to prediction results and giving early warning.
In order to achieve the above object, the method for early warning of transaction risk of a user provided by the present application specifically includes: acquiring payment information of a user, and analyzing and acquiring a corresponding transaction type according to the payment information; analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value; comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result; and generating early warning information according to the first comparison result and the second comparison result.
In the above method for warning the transaction risk of the user, optionally, the obtaining payment information of the user includes: monitoring the payment transaction of a user, and analyzing and obtaining payment amount and corresponding account information according to the payment transaction; obtaining historical payment amount and historical income amount of a user in a preset period according to the account information; and when the sum of the payment amount and the historical payment amount is higher than the historical income amount and/or the payment amount is higher than the historical income amount, generating payment information according to the payment transaction.
In the above method for early warning of transaction risk of a user, optionally, analyzing the payment information by an isolated forest algorithm according to the transaction category to obtain an abnormal predicted value includes: when the transaction type is a non-consumption type, analyzing and constructing the full/local characteristics of the payment information through an isolated forest algorithm to obtain a corresponding decision tree; and generating a corresponding isolated forest according to the decision tree, and analyzing the isolated forest to obtain an abnormal predicted value corresponding to the payment information.
In the above method for early warning of transaction risk of a user, optionally, the payment information includes a transaction amount, an age of a payer, an investment amount, a equity investment amount, a financing amount, a production operation, a historical average transaction amount, a historical fund transfer standard deviation, and a fund transfer frequency in a preset period.
In the above method for early warning of risk of user transaction, optionally, analyzing the payment information by using a random forest algorithm according to the transaction category to obtain a risk prediction value includes: when the transaction type is a consumption type, analyzing the total/local characteristics of historical abnormal consumption data through a random forest algorithm to obtain abnormal strong characteristics; generating a random forest by a random forest algorithm according to the historical abnormal consumption data, the payment information and the abnormal vigor characteristics; and analyzing the similarity of the payment information through the random forest to obtain a risk prediction value.
In the above-mentioned user transaction risk early warning method, optionally, the payment information includes a transaction amount, an age and an occupation of the payer, a matching result of the payment card number and the attribution of the received card number, a transaction record between the payer and the receiver, a historical highest transaction amount of the payment card number, a historical average transaction amount of the payment card number, a historical transaction amount standard deviation of the payment card number, a revenue sum of the received card number in a preset period, and transaction times in the preset period.
In the above method for warning the risk of user transaction, optionally, the generating warning information according to the first comparison result and the second comparison result includes: when the abnormal predicted value is larger than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a first warning strategy; when the abnormal predicted value is smaller than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a second warning strategy; and when the abnormal predicted value is greater than or equal to the preset abnormal threshold value and the risk predicted value is smaller than the preset risk threshold value, generating early warning information and carrying out risk early warning through a third warning strategy.
The application also provides a user transaction risk early warning device which comprises an acquisition module, an analysis module, a comparison module and an early warning module; the acquisition module is used for acquiring payment information of a user and analyzing and acquiring corresponding transaction types according to the payment information; the analysis module is used for analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value; the comparison module is used for comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result; the early warning module is used for generating early warning information according to the first comparison result and the second comparison result.
The application also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the method.
The present application also provides a computer-readable storage medium storing a computer program for executing the above method.
The present application also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above-described method.
The beneficial technical effect of this application lies in: obtaining abnormal values of non-consumption transaction behaviors and threshold values of consumption behaviors based on an isolated forest algorithm and a random forest algorithm, and judging whether the user has high-risk behaviors and early warning behaviors by a user fund risk early warning system by comprehensively analyzing the scoring conditions of the two values; the problem of unbalanced category caused by concentrated transaction samples due to lost user information is effectively solved, the accuracy of judgment of transfer risk is improved through dual algorithm analysis, and the limitation of a single algorithm on prediction of different risks is overcome.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, are incorporated in and constitute a part of this application, and are not intended to limit the application. In the drawings:
fig. 1 is a schematic flow chart illustrating a method for warning a transaction risk of a user according to an embodiment of the present disclosure;
fig. 2 is a schematic view illustrating a process of acquiring payment information according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a flow of obtaining an abnormal predicted value according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an isolated forest algorithm according to an embodiment of the present application;
FIG. 5 is a logic diagram of an isolated forest algorithm according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating an application flow of an isolated forest algorithm according to an embodiment of the present application;
fig. 7 is a schematic diagram illustrating a flow of obtaining a risk prediction value according to an embodiment of the present application;
fig. 8 is a schematic diagram illustrating an application flow of a random forest algorithm according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a user transaction risk early warning apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following detailed description will be provided with reference to the drawings and examples to explain how to apply the technical means to solve the technical problems and to achieve the technical effects. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments in the present application may be combined with each other, and the technical solutions formed are all within the scope of the present application.
Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.
Referring to fig. 1, the present application provides a method for early warning of transaction risk of a user, the method includes:
s101, obtaining payment information of a user, and analyzing and obtaining a corresponding transaction type according to the payment information;
s102, analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value;
s103, comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result;
s104, generating early warning information according to the first comparison result and the second comparison result.
Therefore, the non-consumption behavior abnormal value obtained through the isolated forest and the consumption behavior threshold value obtained through the random forest are obtained. And defining the behaviors with the scores lower than the threshold and the abnormal values as safety behaviors, carrying out alarm high-risk behaviors on the behaviors with the scores higher than the threshold and the abnormal values, and carrying out early warning on dangerous behaviors when the threshold or the abnormal values are higher than the normal values and only one of the threshold or the abnormal values is higher than the normal value. And comprehensively analyzing to obtain the fund risk prediction of the user, giving a prompt in time and reducing unnecessary transaction risks of the user.
Referring to fig. 2, in an embodiment of the present application, the obtaining payment information of the user includes:
s201, monitoring the payment transaction of a user, and analyzing and obtaining payment amount and corresponding account information according to the payment transaction;
s202, obtaining historical payment amount and historical income amount in a preset period of the user according to the account information;
s203, when the sum of the payment amount and the historical payment amount is higher than the historical income amount and/or the payment amount is higher than the historical income amount, generating payment information according to the payment transaction.
Specifically, in the actual work, in order to avoid obvious normal transaction behaviors and perform unnecessary risk early warning, in this embodiment, a preliminary risk detection may be performed, that is, when the sum of the payment amount and the historical payment amount is higher than the historical income amount and/or the payment amount is higher than the historical income amount, it is determined that the payment transaction may have a risk, and subsequent identification is required; certainly, in actual work, other modes can be adopted for primary anomaly identification, payment transactions possibly having risks can be extracted for deep analysis, and the primary identification mode is not limited at all here.
Referring to fig. 3, in an embodiment of the present application, analyzing the payment information according to the transaction category by an isolated forest algorithm to obtain an abnormal prediction value includes:
s301, when the transaction type is a non-consumption type, analyzing and constructing the full/local characteristics of the payment information through an isolated forest algorithm to obtain a corresponding decision tree;
s302, generating a corresponding isolated forest according to the decision tree, and analyzing the isolated forest to obtain an abnormal predicted value corresponding to the payment information.
The payment information comprises transaction amount, payer age, investment amount, equity investment amount, financing amount, production and management, historical average transaction amount, historical fund transfer standard deviation and fund transfer times in a preset period. In actual work, aiming at non-consumption type transaction expenditure, the method analyzes the total/local characteristics of the transaction to be predicted of the user according to an isolated forest algorithm, and constructs a corresponding decision tree to further obtain an isolated forest; counting the average path length of the non-consumption type according to the isolated forest algorithm to obtain an abnormal score of the non-consumption type; transactions with an anomaly score close to a threshold are determined to be risk transactions.
In the above embodiment, the isolated forest algorithm analyzes the full/local features of the transaction to be predicted of the user, constructs a corresponding decision tree to obtain an isolated forest, and may include establishing decision trees according to a plurality of features of the transaction to be predicted, and generating the isolated forest from all the decision trees, wherein the features are in each decision tree; and calculating the abnormal score of the transaction to be predicted according to the isolated forest.
According to the technical scheme, the abnormal transactions with obvious and relatively rare characteristics in the transactions to be predicted of the users are obtained through analysis of the isolated forest algorithm, the abnormal scores of the transactions to be predicted are calculated, and finally the transaction risks are predicted according to the abnormal scores, so that the risk transactions can be effectively predicted and identified, the users are reminded to confirm the transactions, the fund loss caused by impulse consumption or fraud of the users is reduced, and the card using safety of the users is improved.
Referring to fig. 4, in practical work, the analysis process for the non-consumption type payment information mainly includes the following three steps:
1. and analyzing the total/local characteristics of the transaction to be predicted of the user according to an isolated forest algorithm, and constructing a corresponding decision tree so as to generate an isolated forest.
The transaction data to be predicted comprises transaction amount, user age, investment amount, equity investment amount, financing amount, production and management, historical average transaction amount, historical fund transfer standard deviation and fund transfer times of 5 days.
2. And processing the isolated forest according to the isolated forest algorithm to obtain the abnormal score of the transaction to be predicted.
3. Transactions with an anomaly score close to a threshold are determined as risk transactions.
Specifically, the embodiment firstly adopts an isolated forest algorithm to analyze the total/local characteristics of the transaction to be predicted of the user, and constructs a corresponding isolated forest; and then carrying out abnormal scoring on the transaction to be predicted, and defining the transaction with the score close to a certain threshold value as the transaction with risk.
The Isolation forest (Isolation forest) algorithm is a rapid anomaly detection method based on Ensemble (the classification accuracy is improved by aggregating the predictions of a plurality of classifiers, a new model is trained by repeated sampling, and finally, averaging is carried out on the basis of the models), and the method has linear time complexity, high accuracy and high speed when large data is processed. "isolated" in soliton refers to "isolating outliers from all samples". Most model-based anomaly detection algorithms "specify" a range or pattern of normal points first, and if a point does not fit into this pattern, or is not within the normal range, the model will determine it as an anomaly. The theoretical basis of the isolated forest algorithm is two points: 1) The proportion of abnormal data in the total sample size is very small; 2) The characteristic value of the abnormal point is greatly different from the normal point. Based on the two theories, modeling the data to be detected by a mathematical method and calculating a score to obtain a final detection result; according to the embodiment, the isolation degree of the transaction to be predicted of the user is analyzed through an isolated forest algorithm, the abnormal score of the transaction to be predicted is calculated, the risk of the transaction is predicted according to the abnormal score, the risk probability of the transaction of the user can be rapidly predicted by adopting an objective scientific method, and the fund loss of the user is avoided.
Referring to fig. 5, the isolated forest algorithm constructs n decision trees according to m sample training sets, and each decision tree uses the same sample set. Assuming that the number of all the characteristics of each sample is K, selecting K characteristics in the K characteristics for n samples, selecting one characteristic value in a random mode to carry out segmentation, decomposing a classification result of new data into 2 sub-nodes according to the characteristic value smaller than or equal to the characteristic value, and then continuing recursive segmentation on the sub-nodes according to the same method. Under the support of a big data sample, abnormal data can be segmented quickly, the distance between the abnormal data and a root node of a decision tree is short, and abnormal scores can be obtained quickly by comparing the segmentation number mean value of sample data with the average segmentation number mean value of all nodes of an isolated forest so as to judge whether the data is abnormal or not. When the decision tree is constructed, the division points are selected completely randomly, the result is converged by using an ensemble method, namely, the division is started from the beginning repeatedly, and then the average value of each division result is calculated. Specifically, as shown in fig. 6, the overall process includes three aspects of feature screening to perform sample, isolated forest construction and risk prediction.
1. Feature screening-sample preparation: (historical user fund loss case transaction data preparation)
In the isolated forest module of the transaction risk prediction system, transaction amount, user age, investment amount, equity investment amount, financing amount, production management, historical average transaction amount, historical fund transfer standard deviation and transfer times of funds in 5 days in historical user fund loss case transaction data form a sample set n.
2. Construction of isolated trees
1. Randomly selecting N points from the data sample as subsamples, and placing the subsamples at the root node of an isolated tree;
2. randomly appointing a dimension, and randomly generating a cutting point p within the range of the current node data, wherein the cutting point is generated between the maximum value and the minimum value of the appointed dimension in the current node data;
3. the selection of the cutting point generates a hyperplane, and the data space of the current node is divided into 2 subspaces: placing points smaller than p under the currently selected dimension at the left branch of the current node, and placing points larger than or equal to p at the right branch of the current node;
4. recursion steps 2 and 3 are carried out on the left branch node and the right branch node of the node, new leaf nodes are continuously constructed until only one datum (the cutting can not be continued) is arranged on the leaf nodes or the tree grows to the set height
Integrate the results of all the orphan trees, x being the coordinates of the node drop
Since the cutting process is completely random, the method of ensemble is required to converge the result, i.e. cut from the beginning repeatedly, and then calculate the average value of the result of each cut.
For n data samples, the path length is denoted as h (n), and the average path length c (n) is:
Figure BDA0003879678040000071
wherein H (i) is the harmonic number, equal to ln (i) + Euler constant
The length of the isolated binary tree is normalized, and the number between 0 and 1 can be obtained, namely the abnormal score of the detected sample. Note that s (x, n) is the anomaly score:
the anomaly score s is then calculated using the generated orphan tree:
Figure BDA0003879678040000072
h (x) is the height of x per tree, c (N) is the average of the path lengths for a given number of samples N, and is used to normalize the path length h (x) of a sample x
5. After t isolated trees are obtained, the training of a single tree is finished. Because the formation of the isolated binary tree has certain randomness and the result of a single tree is unreliable, for the data sample to be detected, the data sample is made to traverse each tree of the isolated forest, each sample value in the data sample is calculated to fall on the second layer of each isolated binary tree, and finally the average depth h (x) of the sample x in each tree is obtained. The anomaly score is related to the depth of the sample in the isolated binary tree, and when the depth of the sample in the isolated binary tree is smaller, the anomaly score is higher, namely the probability that the sample is an abnormal sample is higher. If the anomaly score is close to 1, then the probability is the anomaly point; if the anomaly score is much less than 0.5, then it must not be an anomaly point; if the scores of all points for an outlier are around 0.5, then there is a high probability that an outlier is not present in the sample.
3. Threshold setting prediction risk
In the last step, the transaction to be predicted has an abnormal score, a risk threshold value can be set, and if the abnormal score is more than 0.9, the transaction to be predicted is judged to have risk transaction.
Referring to fig. 7, in an embodiment of the present application, analyzing the payment information according to the transaction category by using a random forest algorithm to obtain a risk prediction value includes:
s701, when the transaction type is a consumption type, analyzing the total/local characteristics of historical abnormal consumption data through a random forest algorithm to obtain abnormal strong characteristics;
s702, generating a random forest by a random forest algorithm according to the historical abnormal consumption data, the payment information and the abnormal vigor characteristics;
s703, analyzing the similarity of the payment information through the random forest to obtain a risk prediction value.
The payment information comprises transaction amount, the age and occupation of a payer, the matching result of the payment card number and the attribution of a receiving card number, a transaction record between the payer and a receiver, historical highest transaction amount of the payment card number, historical average transaction amount of the payment card number, historical transaction amount standard deviation of the payment card number, a income sum of the receiving card number in a preset period and transaction times in the preset period.
Specifically, the overall process of risk prediction for consumption-type transaction expenditure in the present application includes: analyzing the full/local characteristics of the past user fund loss case data according to a random forest algorithm to obtain the user information loss case data strong characteristics; processing the electric fraud case data, the transaction to be predicted and the strong feature according to the random forest algorithm to obtain a similarity score of the transaction to be predicted; and determining the transaction with the similarity score smaller than the threshold value as the normal consumption transaction. Wherein, the processing the electric fraud scheme case data, the transaction to be predicted and the strong feature according to the random forest algorithm to obtain the similarity score of the transaction to be predicted can include: establishing decision trees of the transaction to be predicted according to a plurality of the strong features, and generating a random forest by all the decision trees, wherein all the features in the strong features are in all the decision trees; and calculating the similarity score of the transaction to be predicted according to the random forest. The electrical fraud case data information includes: transaction amount, age of the transfer party, occupation (whether retirement, student or non-business), whether the transfer card number matches the payee card number attribution, whether there is a transaction record between the transfer card number and the payee card number, historical maximum transaction amount of the transfer card number, historical average transaction amount, historical transaction amount standard deviation, income total value of the payee party in the last three days, expense of the payee party in the last three days, and transaction times in the last 5 days. According to the method and the device, the strong characteristics of the user risk transaction are obtained through random forest algorithm analysis, the similarity score of the transaction to be predicted is calculated, and the transaction risk is predicted according to the similarity score, so that the situation that due to the fact that user information is lost, a user with low distinguishing capability is directly transferred to a fraud designated account can be effectively prevented, and the success rate of user information loss is reduced through prediction of risk probability of transfer transaction.
Referring to fig. 8, in predicting consumption type data through random forest, the implementation principle is as follows:
1. analyzing the full quantity/local characteristics of the electric fraud case transaction according to a random forest algorithm to obtain the strong characteristics of the electric fraud case transaction.
2. And processing the historical electric fraud case transaction, the transaction to be predicted and the strong feature according to the random forest algorithm to obtain a similarity score of the transaction to be predicted.
3. And determining the transaction with the similarity score smaller than the threshold value as a normal consumption type transaction.
Specifically, the method comprises the steps of firstly analyzing the full quantity/local characteristics of the historical electricity fraud case transactions by adopting a random forest algorithm, and automatically finding out the strong characteristics of the historical electricity fraud case transactions; and then sending the historical power fraud case transaction, the to-be-predicted real-time transaction and the strong characteristic field into a random forest algorithm, scoring the similarity of the to-be-predicted transaction, and defining the transaction with the score above a certain threshold value as the risk-existing transaction. The Random forest (Random trees) algorithm is a relatively new machine learning model, is a representative algorithm in a Bagging integration method, firstly selects n samples from a sample set, then randomly selects k attributes from all the attributes, selects the optimal segmentation attribute as a node to establish a decision tree, repeats the two steps for m times, namely establishes m decision trees, finally forms a Random forest from the m trees, and obtains which type the input data belongs to through voting results. The classical machine learning model belongs to a neural network, and although the neural network can accurately predict, the calculation amount is large. The random forest has high efficiency for classifying the data sets of the multi-dimensional features, can also be used for selecting feature importance, and improves the prediction precision on the premise that the calculation amount is not remarkably increased. The random forest is simply a forest generated randomly, the forest is composed of a plurality of decision trees, and each decision tree has no relation. After the forest is obtained, when a new input sample enters, each decision tree in the forest can judge which type the sample belongs to, and finally, the sample is predicted to be which type through voting.
According to the embodiment, the strong characteristic of the user fund loss case transaction is obtained through random forest algorithm analysis, the similarity score of the transaction to be predicted is calculated, and finally the risk of the transaction is predicted according to the similarity score, so that the risk probability of the transfer transaction can be rapidly predicted by adopting an objective and scientific method, and the success rate of user information loss is reduced. The random forest algorithm mainly constructs m decision trees according to m sample training sets, and each decision tree randomly extracts n samples from an original training sample set to generate a new training sample set. Assuming that the number of all the characteristics of each sample is K, selecting K characteristics in the K for n samples, obtaining the best segmentation point by establishing a decision tree, determining the classification result of new data according to the number of scores formed by voting in the decision tree, and screening out a set with the best characteristic value according to the quality of data classification. Random forest is an improvement of decision tree algorithm, combining multiple decision trees, building each tree depending on an independent sample, each tree in forest has the same distribution, and the classification error depends on the classification capability of each tree and the correlation between them. The feature selection adopts a random method to split each node, and then errors generated under different conditions are compared. The inherent estimation errors, classification capabilities and correlations that can be detected determine which valuable feature values to select. The classification capability of a single tree may be small, but after a large number of decision trees are randomly generated, a test sample may be statistically selected from the most likely classification and the most valuable feature values by the classification result of each tree.
The key point of the decision tree construction is the selection of the segmentation points, and the greedy algorithm is adopted to consider the magnitude of the purity difference of the current segmentation points as an element to carry out priority ordering from large to small. And for the quantification of the purity, an ID3 algorithm is used, the information gain measurement attribute is selected, and the attribute with the maximum information gain after splitting is selected for splitting.
Figure BDA0003879678040000101
Figure BDA0003879678040000102
Equation 3: gain (A) = info (D) -info A (D);
Equation 1 above is a representation of the entropy of information in set D, as is the probability of the ith class appearing in set D. In formula 2, assuming that a training set D is divided according to a characteristic attribute a, it represents an expected entropy of information divided by a to D, and then an information gain (a) obtained after division according to a characteristic attribute a is represented by formula 3, all characteristic values are recursively ordered according to the information gain, so as to construct a whole decision tree, and pruning is not required in the decision tree constructed by a random forest system, so that training data can be represented accurately, and overfitting can occur in other data even though overfitting does not occur so accurately, but overfitting of a single decision tree can be avoided through common decision of multiple decision trees for ensemble learning.
Further, the random forest algorithm is applied to actual work, and the implementation process is as follows:
1. compelling feature screening-sample preparation (historical electric fraud case transaction data preparation)
In a random forest module of a transaction risk prediction system, information such as transaction amount, account transfer party age (whether retired, students or unoperated), occupation, whether account transfer card numbers are matched with attributions of payee card numbers, whether transaction records exist between the account transfer card numbers and the payee card numbers, historical highest transaction amount of the account transfer card numbers, historical average transaction amount, historical transaction amount standard deviation, income total value of the payee in three days, expense of the payee in three days, transaction duration (days) from the last transaction, transaction times in 5 days and the like in historical user risk transaction data form a sample set N.
2. Robust feature screening-feature field preparation
All fields are fed into a random forest module (i.e., feature field preparation), with K features per sample.
3. Strong feature screening-random forest generation
N samples are sampled from the sample set N.
And randomly selecting K features from the K features, and establishing a decision tree by using the features on the selected samples.
Repeating the two steps for m times to generate m decision trees to form a random forest.
In the 1 st decision tree, there are n samples characterized by (K) 1 、K 2 ...K k )。
In the 2 nd decision tree, there are n samples characterized by (K) 2 、K 3 ...K k+1 )。
......
In the m decision tree, there are n samples characterized by (K) m 、K m+1 ...K k+m -1)。
4. Robust feature screening-voting
Sending the sample data into m decision trees respectively, wherein:
1 st decision tree vote:
Figure BDA0003879678040000111
decision tree vote 2:
Figure BDA0003879678040000112
the mth decision tree votes:
Figure BDA0003879678040000121
...
5. strong feature screening-strong feature output
Evaluating the classification result according to each decision tree, and screening out the best feature type set: and the random forest algorithm carries out classification ranking according to the proportion of the feature fields in the decision tree with excellent performance, and the features which are more advanced in the ranking are represented as more strongly related features.
6. Preparation of transaction data to be predicted
The transaction risk prediction module is used for constructing data such as transaction amount, age of a transfer party, occupation (whether retirement, students or no industry exists), whether a transfer card number is matched with a card number attribution of a receiving party, whether transaction records exist between the transfer card number and the receiving party, historical highest transaction amount of the transfer card number, historical average transaction amount, historical transaction amount standard deviation, income sum of the receiving party in three days, expenses of the receiving party in three days, transaction duration (days) from the last transaction, transaction times in 5 days and the like in transfer transaction in real time.
7. Historical risk transaction screening-random forest generation
And generating a new random forest by simulating the historical risk transaction, the strong feature and the transaction to be predicted in the third step.
8. Historical risk transaction screening-voting
And voting the transaction to be predicted according to the fourth step.
9. Historical risk transaction-transaction similarity score output to be predicted
And outputting an algorithm result, wherein the transaction to be predicted has a similarity score which is between [0,1], and the closer the score is to 1, the higher the possibility that the transfer transaction has risk is.
10. Threshold setting prediction risk
In the ninth step, the transaction to be predicted has a similarity score, and a risk threshold can be set, and if the similarity score is more than 0.9, the transaction is determined to have risk.
In an embodiment of the present application, generating the warning information according to the first comparison result and the second comparison result includes:
when the abnormal predicted value is larger than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a first warning strategy;
when the abnormal predicted value is smaller than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a second warning strategy; when the temperature is higher than the set temperature
And when the abnormal predicted value is greater than or equal to the preset abnormal threshold value and the risk predicted value is less than the preset risk threshold value, generating early warning information and carrying out risk early warning through a third warning strategy.
In the above embodiment, the first alarm policy, the second alarm policy, and the third alarm policy may be set in advance by a worker according to technical requirements, which is not further limited in this application.
Referring to fig. 9, the present application further provides a user transaction risk early warning device, which includes an acquisition module, an analysis module, a comparison module, and an early warning module; the acquisition module is used for acquiring payment information of a user and analyzing and acquiring corresponding transaction types according to the payment information; the analysis module is used for analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value; the comparison module is used for comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result; the early warning module is used for generating early warning information according to the first comparison result and the second comparison result. The specific implementation logic of each component has been described in detail in the foregoing embodiments, and is not described in detail herein.
The beneficial technical effect of this application lies in: obtaining abnormal values of non-consumption transaction behaviors and threshold values of consumption behaviors based on an isolated forest algorithm and a random forest algorithm, and judging whether the user has high-risk behaviors and early warning behaviors by a user fund risk early warning system by comprehensively analyzing the scoring conditions of the two values; the problem of unbalanced category caused by concentrated transaction samples due to lost user information is effectively solved, the accuracy of judgment of transfer risk is improved through dual algorithm analysis, and the limitation of a single algorithm on prediction of different risks is overcome.
The present application further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method is implemented.
The present application also provides a computer-readable storage medium storing a computer program for executing the above method.
The present application also provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above-described method.
As shown in fig. 10, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in FIG. 10; in addition, the electronic device 600 may further include components not shown in fig. 10, which may be referred to in the prior art.
As shown in fig. 10, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.
The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.
The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.
The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.
The memory 140 may also include a data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).
The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present application in detail, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (11)

1. A user transaction risk early warning method is characterized by comprising the following steps:
acquiring payment information of a user, and analyzing and acquiring a corresponding transaction type according to the payment information;
analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value;
comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result;
and generating early warning information according to the first comparison result and the second comparison result.
2. The method of claim 1, wherein obtaining payment information of the user comprises:
monitoring the payment transaction of a user, and analyzing and obtaining payment amount and corresponding account information according to the payment transaction;
obtaining historical payment amount and historical income amount of a user in a preset period according to the account information;
and when the sum of the payment amount and the historical payment amount is higher than the historical income amount and/or the payment amount is higher than the historical income amount, generating payment information according to the payment transaction.
3. The user transaction risk early warning method according to claim 1, wherein analyzing the payment information by an isolated forest algorithm according to the transaction category to obtain an abnormal prediction value comprises:
when the transaction type is a non-consumption type, analyzing and constructing the full/local characteristics of the payment information through an isolated forest algorithm to obtain a corresponding decision tree;
and generating a corresponding isolated forest according to the decision tree, and analyzing the isolated forest to obtain an abnormal predicted value corresponding to the payment information.
4. The method of claim 3, wherein the payment information includes a transaction amount, a payer age, an investment amount, a equity investment amount, a financing amount, a production operation, a historical average transaction amount, a historical fund transfer standard deviation, and a fund transfer number within a preset period.
5. The user transaction risk early warning method according to claim 1, wherein analyzing the payment information by a random forest algorithm according to the transaction category to obtain a risk prediction value comprises:
when the transaction type is a consumption type, analyzing the total/local characteristics of historical abnormal consumption data through a random forest algorithm to obtain abnormal strong characteristics;
generating a random forest by a random forest algorithm according to the historical abnormal consumption data, the payment information and the abnormal vigor characteristics;
and analyzing the similarity of the payment information through the random forest to obtain a risk prediction value.
6. The method of claim 5, wherein the payment information includes a transaction amount, a payer age, an occupation, a matching result of the payment card number and a receiving card number attribution, a transaction record between the payer and the receiving party, a historical highest transaction amount of the payment card number, a historical average transaction amount of the payment card number, a historical transaction amount standard deviation of the payment card number, a total income value of the receiving card number in a preset period, and transaction times in the preset period.
7. The method of claim 1, wherein generating early warning information according to the first comparison result and the second comparison result comprises:
when the abnormal predicted value is larger than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a first warning strategy;
when the abnormal predicted value is smaller than a preset abnormal threshold value and the risk predicted value is larger than a preset risk threshold value, generating early warning information and carrying out risk early warning through a second warning strategy;
and when the abnormal predicted value is greater than or equal to the preset abnormal threshold value and the risk predicted value is less than the preset risk threshold value, generating early warning information and carrying out risk early warning through a third warning strategy.
8. A user transaction risk early warning device is characterized by comprising an acquisition module, an analysis module, a comparison module and an early warning module;
the acquisition module is used for acquiring payment information of a user and analyzing and acquiring corresponding transaction types according to the payment information;
the analysis module is used for analyzing the payment information through an isolated forest algorithm and a random forest algorithm according to the transaction category to obtain an abnormal prediction value and a risk prediction value;
the comparison module is used for comparing the abnormal predicted value with a preset abnormal threshold value to obtain a first comparison result, and comparing the risk predicted value with the preset risk threshold value to obtain a second comparison result;
the early warning module is used for generating early warning information according to the first comparison result and the second comparison result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that it stores a computer program for executing the method of any one of claims 1 to 7 by a computer.
11. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method of any of claims 1 to 7.
CN202211225975.5A 2022-10-09 2022-10-09 User transaction risk early warning method and device Pending CN115409518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211225975.5A CN115409518A (en) 2022-10-09 2022-10-09 User transaction risk early warning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211225975.5A CN115409518A (en) 2022-10-09 2022-10-09 User transaction risk early warning method and device

Publications (1)

Publication Number Publication Date
CN115409518A true CN115409518A (en) 2022-11-29

Family

ID=84167379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211225975.5A Pending CN115409518A (en) 2022-10-09 2022-10-09 User transaction risk early warning method and device

Country Status (1)

Country Link
CN (1) CN115409518A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994763A (en) * 2023-03-23 2023-04-21 深圳市德卡科技股份有限公司 Trusted intelligent payment method and system
CN116645097A (en) * 2023-03-30 2023-08-25 广东盛迪嘉电子商务股份有限公司 Payment clearing platform monitoring and early warning system
CN117273749A (en) * 2023-11-21 2023-12-22 青岛巨商汇网络科技有限公司 Transaction management method and system based on intelligent interaction

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994763A (en) * 2023-03-23 2023-04-21 深圳市德卡科技股份有限公司 Trusted intelligent payment method and system
CN115994763B (en) * 2023-03-23 2023-09-01 深圳市德卡科技股份有限公司 Trusted intelligent payment method and system
CN116645097A (en) * 2023-03-30 2023-08-25 广东盛迪嘉电子商务股份有限公司 Payment clearing platform monitoring and early warning system
CN117273749A (en) * 2023-11-21 2023-12-22 青岛巨商汇网络科技有限公司 Transaction management method and system based on intelligent interaction

Similar Documents

Publication Publication Date Title
CN115409518A (en) User transaction risk early warning method and device
CN111275546B (en) Financial customer fraud risk identification method and device
CN110738564A (en) Post-loan risk assessment method and device and storage medium
CN111932269B (en) Equipment information processing method and device
CN112785086A (en) Credit overdue risk prediction method and device
CN112215702A (en) Credit risk assessment method, mobile terminal and computer storage medium
CN111401906A (en) Transfer risk detection method and system
CN111582341B (en) User abnormal operation prediction method and device
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN111767319A (en) Customer mining method and device based on fund flow direction
CN112801775A (en) Client credit evaluation method and device
CN110717509A (en) Data sample analysis method and device based on tree splitting algorithm
CN114154672A (en) Data mining method for customer churn prediction
CN112734565B (en) Fluidity coverage prediction method and device
CN113392920A (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
EP3879418B1 (en) Identity verification method and device
CN111026991B (en) Data display method and device and computer equipment
CN114998001A (en) Service class identification method, device, equipment, storage medium and program product
CN113706258A (en) Product recommendation method, device, equipment and storage medium based on combined model
CN113515577A (en) Data preprocessing method and device
CN111429144A (en) Abnormal remittance transaction identification method and device
CN111768306A (en) Risk identification method and system based on intelligent data analysis
CN111951099A (en) Credit card issuing model and application method thereof
CN111932018B (en) Bank business performance contribution information prediction method and device
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination