CN117649236A - Risk prediction method, apparatus and storage medium for transaction - Google Patents

Risk prediction method, apparatus and storage medium for transaction Download PDF

Info

Publication number
CN117649236A
CN117649236A CN202311632611.3A CN202311632611A CN117649236A CN 117649236 A CN117649236 A CN 117649236A CN 202311632611 A CN202311632611 A CN 202311632611A CN 117649236 A CN117649236 A CN 117649236A
Authority
CN
China
Prior art keywords
rule
target
transaction
chains
transaction data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311632611.3A
Other languages
Chinese (zh)
Inventor
徐德华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Electronic Commerce Co Ltd
Original Assignee
Tianyi Electronic Commerce Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Electronic Commerce Co Ltd filed Critical Tianyi Electronic Commerce Co Ltd
Priority to CN202311632611.3A priority Critical patent/CN117649236A/en
Publication of CN117649236A publication Critical patent/CN117649236A/en
Pending legal-status Critical Current

Links

Abstract

The application discloses a risk prediction method, a risk prediction device and a storage medium for transactions. The method comprises the following steps: n transaction data of target transaction are determined, and are input into an integrated tree model to obtain a decision tree with a first preset number; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains according to the accuracy rate and recall rate of the rule chains; screening the P candidate rule chains by a feature screening method to obtain Q item target rule chains, and determining the target number of target rule chains which are consistent with the transaction data of the target transaction; and under the condition that the target number is greater than or equal to the rule chain number threshold, determining the target transaction as a risk transaction. According to the method and the device, the problem that in the related art, due to the fact that transaction data are not accurate enough in selection, the accuracy of a prediction result of whether risk exists in the transaction is low is solved.

Description

Risk prediction method, apparatus and storage medium for transaction
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a risk prediction method, apparatus and storage medium for transactions.
Background
With the continuous development and application of the internet technology, various services provided by the internet technology are attractive to more and more enterprises to participate in the services due to the characteristics of convenience and high efficiency, and are integrated into aspects of life. However, the wide use of the internet is accompanied by the advent of various network blackouts such as risk transactions. Therefore, it is necessary to control various risk events on the network, perfect various wind control measures, and discover and hit the network black products in advance.
In the related technology, the monitoring of risk events can be divided into two aspects, namely, a strategy rule-based method generally requires business personnel to analyze a large amount of business data, and the patterns and characteristics of black products are summarized, so that corresponding rules are formulated for interception, the method is time-consuming and depends on professional knowledge of the business personnel, a unified rule generation method is difficult to form, the subjectivity of data for generating the rules, which are screened manually, is high, and the selected data are not accurate enough. The other model is a black box model based on various machine learning or neural network methods, the model is high in complexity and poor in interpretability, the detected risk data is difficult to reasonably interpret, and some adjustments are difficult to manually adjust the model.
Aiming at the problem that in the related art, due to inaccurate selection of transaction data, the accuracy of a prediction result of whether a risk exists in the transaction is low, no effective solution is proposed at present.
Disclosure of Invention
The main objective of the present application is to provide a risk prediction method, apparatus and storage medium for transaction, so as to solve the problem in the related art that the accuracy of the prediction result of whether the transaction has risk is low due to the insufficient accuracy of transaction data selection.
To achieve the above object, according to one aspect of the present application, there is provided a risk prediction method of a transaction. The method comprises the following steps: n transaction data of target transactions are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determining the target number of target rule chains which are consistent with the transaction data of target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; and under the condition that the target number is greater than or equal to the rule chain number threshold, determining the target transaction as a risk transaction.
Optionally, the integrated tree model is derived by: obtaining transaction data of each historical transaction to obtain a plurality of groups of sample transaction data, and inputting each group of sample transaction data into an initial integrated tree model to obtain a first preset number of sample decision trees corresponding to each group of sample transaction data; determining each group of sample transaction data and a sample decision tree corresponding to the sample transaction data as a group of training samples to obtain a plurality of groups of training samples; and performing repeated iterative training on the initial integrated tree model based on a plurality of groups of training samples to obtain an integrated tree model with a preset performance evaluation value greater than or equal to a performance evaluation value threshold, wherein the model parameters of the initial integrated tree model are updated in each iterative training.
Optionally, the screening the P candidate rule chains from the M rule chains by the accuracy rate and recall rate of the rule chains includes: acquiring a test sample set, wherein the test sample set comprises a plurality of test samples, and each test sample comprises test transaction data and risk test results; for each rule chain in the M rule chains, predicting the test sample set through the current rule chain to obtain a risk prediction result of each test sample in the test sample set; calculating the accuracy and recall rate of each rule chain based on the risk prediction result and the risk test result; and under the condition that the accuracy rate is greater than or equal to an accuracy rate threshold value and the recall rate is greater than or equal to a recall rate threshold value, determining the rule chain corresponding to the accuracy rate and the recall rate as a candidate rule chain.
Optionally, predicting the test sample set through the current rule chain, and obtaining a risk prediction result of each test sample in the test sample set includes: f first transaction data contained in a current rule chain are determined, and target constraint conditions of each first transaction data are determined, wherein F is smaller than N, F is a positive integer, and the target constraint conditions are characteristic value ranges to which characteristic values of the first transaction data belong; for each test sample, extracting a characteristic value of the first transaction data from the test sample to obtain a target characteristic value; under the condition that the target characteristic values of the F kinds of first transaction data all meet the target constraint conditions, determining a risk prediction result of the test sample as risk transaction; and determining that the risk prediction result of the test sample is a non-risk transaction under the condition that at least one target characteristic value of the first transaction data does not meet the target constraint condition.
Optionally, calculating the accuracy and recall of each rule chain based on the risk prediction results and the risk test results includes: determining the number of first test samples in a test sample set to obtain the first number, wherein the first test samples are samples of risk transactions, and the risk test results and the risk prediction results are samples of risk transactions; determining the number of second test samples in the test sample set to obtain a second number, wherein the second test samples are samples of which the risk test result is risk transaction and the risk prediction result is non-risk transaction; determining the number of third test samples in the test sample set to obtain the third number, wherein the third test samples are samples of which the risk test result is a non-risk transaction and the risk prediction result is a risk transaction; calculating the sum of the first quantity and the second quantity to obtain a first sum value, and calculating the ratio of the first quantity to the first sum value to obtain the accuracy rate; and calculating the sum of the first quantity and the third quantity to obtain a second sum value, and calculating the ratio of the first quantity to the second sum value to obtain the recall rate.
Optionally, screening the P candidate rule chains by a feature screening method to obtain a Q candidate rule chain includes: determining a second preset number of feature screening methods, and calculating the ratio of the second preset number to a preset value to obtain a method number threshold; for each candidate rule chain in the P candidate rule chains, determining a screening result of each feature screening method on the current candidate rule chain to obtain a screening result set, wherein the screening result comprises approval and disapproval, the screening result indicates that the importance evaluation value of the approval representing the current candidate rule chain is greater than or equal to an importance evaluation value threshold, and the screening result indicates that the importance evaluation value of the disapproval representing the current candidate rule chain is smaller than the importance evaluation value threshold; determining the number of screening results which are approved from the screening result set to obtain the number of target screening results; under the condition that the number of the target screening results is greater than or equal to a method number threshold, determining the current candidate rule chain as a pending rule chain; and optimizing all rule chains to be determined through a genetic algorithm to obtain the Q-item standard rule chain.
Optionally, optimizing all rule chains to be set through a genetic algorithm to obtain the Q-item standard rule chain includes: each undetermined rule chain is determined to be an individual, and all undetermined rule chains are determined to be an initial population; performing genetic operation on the initial population through a genetic algorithm to obtain a multi-generation population, wherein the genetic operation at least comprises one of the following steps: a selection operation, a crossover operation and a mutation operation; calculating the fitness function value of each generation of population, and determining the population corresponding to the fitness function value as a target population under the condition that the fitness function value is smaller than the fitness function threshold; and determining a Q-item target rule chain according to the rule chains contained in the target population.
Optionally, the fitness function value is calculated as follows:
wherein min obj is an fitness function value, alpha is a preset constant, alpha is 0 to less than or equal to 1, num_con sub Num_con, which is the kind of transaction data contained in the target population s For the kind of transaction data contained in the initial population, num_rule sub Num_rule, the number of constraints in the target population s Auc as the number of constraints in the initial population sub The constraint condition is a characteristic value range which is required to be satisfied by the characteristic value of the transaction data for the performance evaluation value of the target population.
To achieve the above object, according to another aspect of the present application, there is provided a risk prediction apparatus for a transaction. The device comprises: the first determining unit is used for determining N transaction data of target transaction, inputting the N transaction data into the integrated tree model to obtain a first preset number of decision trees, wherein N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; the extraction unit is used for extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is larger than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; the screening unit is used for screening the P candidate rule chains through a characteristic screening method to obtain Q item standard rule chains and determine the target number of the target rule chains which are consistent with the transaction data of the target transaction, wherein the characteristic screening method is a screening method based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; and the second determining unit is used for determining that the target transaction is a risk transaction under the condition that the target number is greater than or equal to the rule chain number threshold.
Through the application, the following steps are adopted: n transaction data of target transactions are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determining the target number of target rule chains which are consistent with the transaction data of target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; under the condition that the target number is greater than or equal to the rule chain number threshold, determining that the target transaction is a risk transaction, and solving the problem that in the related art, due to insufficient accuracy of transaction data selection, the accuracy of a prediction result of whether the risk exists in the transaction is low. N transaction data are input into the integrated tree model to obtain a decision tree, a rule chain is extracted from the decision tree, the extracted rule chain is screened by an accuracy rate, a recall rate and a feature screening method to obtain a target rule chain, whether the target transaction has risks or not is evaluated according to the target rule chain, and the effect of improving the accuracy rate of a prediction result of whether the transaction has risks or not is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a risk prediction method for transactions provided according to an embodiment of the present application;
FIG. 2 is a schematic diagram of extracting a rule chain from a decision tree provided in accordance with an embodiment of the present application;
FIG. 3 is a schematic diagram of screening candidate rule chains by a feature screening method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an integrated tree-based rule derivation method provided in accordance with an embodiment of the present application;
FIG. 5 is a schematic diagram of a risk prediction apparatus for transactions provided according to an embodiment of the present application;
fig. 6 is a schematic diagram of an electronic device provided according to an embodiment of the present application.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
The invention will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a risk prediction method for a transaction according to an embodiment of the present application, as shown in fig. 1, and the method includes the following steps:
Step S101, N transaction data of target transaction are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees.
Specifically, the target transaction may be an online transaction performed through a network, an integrated tree model such as lgb (LightGBM, gradient-lifted tree-based machine learning model), xgb (eXtreme Gradient Boosting), a gradient-lifted tree model), a random forest, and the like. Transaction data such as transaction times, transaction time, and transaction party identity information, etc. are related to the target transaction. The N transaction data are obtained by acquiring all transaction data of the target transaction and preprocessing the transaction data, and the first preset number may be the number of decision trees of the artificially set integrated tree model, for example, a random forest is used as the decision tree model in this embodiment, and the first preset number is set to be 100. After N transaction data are input into the integrated tree model, 100 decision trees based on transaction data integration are obtained.
For example, N kinds of feature sets { f for transaction data 1 ,f 2 ,f 3 ,…,f n The integrated tree model is used to classify the target variables as to whether the target transaction is at risk. The integrated tree model is trained by preprocessing the acquired data set and dividing the preprocessed data into a training data set (i.e., a plurality of sets of test samples) and a test data set. The corresponding card stealing rule established by hit business in the payment scene and the data actually checked as the card stealing case are taken as black samples, the rest normal data are white samples, the data with the time period of 2022, 5, 20, and 2022, 6, and 6 are extracted, and 15046 pieces of data are obtained altogether, wherein the black samples are 1292. The training set and the testing set are divided according to the sample acquisition time, and the proportion of the training set and the testing set is about 80% and about 20% respectively. The training set is used for training the integrated tree model, and the test set is used for screening rule chains meeting requirements.
Step S102, M rule chains are extracted from a first preset number of decision trees, P candidate rule chains are screened out of the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is larger than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data.
Specifically, rule extraction is performed on a first preset number of decision trees by using an integrated Tree model to extract a rule chain, fig. 2 is a schematic diagram of extracting a rule chain from a decision Tree according to an embodiment of the present application, as shown in fig. 2, tree1, tree2, …, and Tree k are decision trees, each node represents transaction data, and a rule chain from a leaf node to a root node of each decision Tree in the integrated Tree model is extracted. And (3) primarily screening the M extracted rule chains according to indexes such as the accuracy rate, recall rate and the like of the rule chains, and screening out P candidate rule chains.
Step S103, screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains, and determining the target number of the target rule chains which are consistent with the transaction data of the target transaction, wherein the feature screening method is a screening method based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer.
Specifically, P candidate rule chains after primary screening are taken as characteristics, voting integration is carried out by using a plurality of characteristic screening methods, and the rule chains are further screened. Finally, solving the rule chains subjected to the screening twice by using a genetic algorithm, so that the obtained Q-item standard rule chains have the largest area under auc (Area Under the Curve, ROC curve (Receiver Operating Characteristic curve, receiver operation characteristic curve) and the quantity of the contained rule chains is minimum. Predicting the transaction data of the target transaction through the finally screened Q item standard rule chain, if the characteristic values of the transaction data of the target transaction all accord with the characteristic value range of the corresponding transaction data in the target rule chain, indicating that the transaction data of the target transaction accord with the item standard rule chain, and calculating the target quantity of the target rule chain which accords with the transaction data of the target transaction.
Step S104, determining the target transaction as a risk transaction under the condition that the target number is greater than or equal to the rule chain number threshold.
Specifically, the rule chain number threshold is a manually set number of rule chains used to evaluate whether the target transaction is at risk. If the target number is greater than or equal to the rule chain number threshold, the target transaction can be predicted to be a risk transaction by most transaction data of the target transaction.
According to the risk prediction method for the transaction, N transaction data of a target transaction are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, wherein N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determining the target number of target rule chains which are consistent with the transaction data of target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; under the condition that the target number is greater than or equal to the rule chain number threshold, determining that the target transaction is a risk transaction, and solving the problem that in the related art, due to insufficient accuracy of transaction data selection, the accuracy of a prediction result of whether the risk exists in the transaction is low. N transaction data are input into the integrated tree model to obtain a decision tree, a rule chain is extracted from the decision tree, the extracted rule chain is screened by an accuracy rate, a recall rate and a feature screening method to obtain a target rule chain, whether the target transaction has risks or not is evaluated according to the target rule chain, and the effect of improving the accuracy rate of a prediction result of whether the transaction has risks or not is achieved.
In order to quickly screen transaction data related to risk transactions, the transaction data is initially screened by training an integrated tree model. Optionally, in the risk prediction method for a transaction provided in the embodiment of the present application, the integrated tree model is obtained by: obtaining transaction data of each historical transaction to obtain a plurality of groups of sample transaction data, and inputting each group of sample transaction data into an initial integrated tree model to obtain a first preset number of sample decision trees corresponding to each group of sample transaction data; determining each group of sample transaction data and a sample decision tree corresponding to the sample transaction data as a group of training samples to obtain a plurality of groups of training samples; and performing repeated iterative training on the initial integrated tree model based on a plurality of groups of training samples to obtain an integrated tree model with a preset performance evaluation value greater than or equal to a performance evaluation value threshold, wherein the model parameters of the initial integrated tree model are updated in each iterative training.
Specifically, the preset performance evaluation value may be auc value of the integrated tree model, the integrated tree model is trained through a training data set obtained during data preprocessing, that is, multiple sets of sample transaction data, the integrated tree model is modeled through the training data set, multiple times of training is performed on the sample transaction data through adjusting model parameters in the training process, so that the sample transaction is predicted to be the risk transaction as the target classification intention, that is, a decision tree with a reserved prediction result as the risk transaction is used as the training target in a decision tree integrated with the integrated tree model. For example, a random forest is used as the initial integrated tree model. Since the black samples (i.e., samples where the target transaction is a risk transaction) account for approximately 8.5%, the sample weights are set at training to address sample imbalance issues and limit the maximum tree depth to 6 and the number of trees to 100. The present embodiment initially screens a rule chain based on N transaction data by training an integrated tree model.
In order to obtain a rule chain for predicting risk transaction more accurately, after the M rule chains are primarily screened, the rule chains are further screened according to the accuracy rate and recall rate of the rule chains, and optionally, in the risk prediction method for predicting transaction provided by the embodiment of the application, screening P candidate rule chains from the M rule chains according to the accuracy rate and recall rate of the rule chains includes: acquiring a test sample set, wherein the test sample set comprises a plurality of test samples, and each test sample comprises test transaction data and risk test results; for each rule chain in the M rule chains, predicting the test sample set through the current rule chain to obtain a risk prediction result of each test sample in the test sample set; calculating the accuracy and recall rate of each rule chain based on the risk prediction result and the risk test result; and under the condition that the accuracy rate is greater than or equal to an accuracy rate threshold value and the recall rate is greater than or equal to a recall rate threshold value, determining the rule chain corresponding to the accuracy rate and the recall rate as a candidate rule chain.
Specifically, for M rule chains screened from the integrated tree model, the accuracy and recall of each rule chain are calculated through the test sample set, and the higher the accuracy and recall, the more accurate the prediction result of the rule chain is. The accuracy rate and the recall rate are calculated based on the risk prediction result and the risk test result of the test sample, and for each rule chain, if the accuracy rate of the current rule chain is greater than or equal to an accuracy rate threshold value and the recall rate is greater than or equal to a recall rate threshold value, the current rule chain is determined to be a candidate rule chain, and if the accuracy rate of the current rule chain is smaller than the accuracy rate threshold value or the recall rate is smaller than the recall rate threshold value, the prediction result of the current rule chain is not accurate enough, the current rule chain is eliminated, and the rule chain is further screened through the accuracy rate and the recall rate of the rule chain, so that the prediction accuracy of the rule chain on risk transaction is guaranteed.
Optionally, in the risk prediction method for a transaction provided in the embodiment of the present application, predicting, by using a current rule chain, a test sample set, where obtaining a risk prediction result of each test sample in the test sample set includes: f first transaction data contained in a current rule chain are determined, and target constraint conditions of each first transaction data are determined, wherein F is smaller than N, F is a positive integer, and the target constraint conditions are characteristic value ranges to which characteristic values of the first transaction data belong; for each test sample, extracting a characteristic value of the first transaction data from the test sample to obtain a target characteristic value; under the condition that the target characteristic values of the F kinds of first transaction data all meet the target constraint conditions, determining a risk prediction result of the test sample as risk transaction; and determining that the risk prediction result of the test sample is a non-risk transaction under the condition that at least one target characteristic value of the first transaction data does not meet the target constraint condition.
Specifically, each rule chain includes a plurality of nodes, each node represents transaction data, each node includes a plurality of constraint conditions, the transaction data corresponding to a connection line between the nodes of each rule chain and a node represents a father node meets one or several constraint conditions, for example, a current rule chain a includes F nodes, that is, F types of first transaction data, where node a represents transaction times (first transaction data), node a includes a plurality of constraint conditions, for example, transaction times greater than or equal to 10, transaction times less than 5, transaction times greater than 20, and the like, the target constraint condition of node a included in the current rule chain is transaction times greater than or equal to 10, when the target feature value of the transaction times extracted from the test sample is 15, it is indicated that the target feature value of the first transaction data corresponding to node a meets the target constraint condition, when the target feature value of the first transaction data corresponding to all nodes of the current rule chain meets the target constraint condition, the risk prediction result of the test sample is a risk transaction, and when the target feature value of the first transaction data corresponding to at least one type of the first transaction data does not meet the target constraint condition, the risk prediction result of the test sample is a risk transaction. The embodiment calculates the accuracy and recall of the current rule chain by determining the risk prediction result of each test sample.
Optionally, in the risk prediction method for a transaction provided in the embodiment of the present application, calculating the accuracy and recall of each rule chain based on the risk prediction result and the risk test result includes: determining the number of first test samples in a test sample set to obtain the first number, wherein the first test samples are samples of risk transactions, and the risk test results and the risk prediction results are samples of risk transactions; determining the number of second test samples in the test sample set to obtain a second number, wherein the second test samples are samples of which the risk test result is risk transaction and the risk prediction result is non-risk transaction; determining the number of third test samples in the test sample set to obtain the third number, wherein the third test samples are samples of which the risk test result is a non-risk transaction and the risk prediction result is a risk transaction; calculating the sum of the first quantity and the second quantity to obtain a first sum value, and calculating the ratio of the first quantity to the first sum value to obtain the accuracy rate; and calculating the sum of the first quantity and the third quantity to obtain a second sum value, and calculating the ratio of the first quantity to the second sum value to obtain the recall rate.
Specifically, the calculation formula of the accuracy is: accuracy = TP/(tp+fp), recall is calculated as: recall = TP/(tp+fn), where TP is the first number, FP is the second number, and FN is the third number. In the embodiment, the M rule chains are further screened by calculating the accuracy rate and the recall rate, so that the candidate rule chain with higher accuracy rate for predicting risk transaction is obtained.
In order to improve the prediction performance of the rule chain on risk transactions, the candidate rule chains are further screened by a feature screening method, optionally, in the risk prediction method for transactions provided in the embodiment of the present application, screening P candidate rule chains by the feature screening method, and obtaining Q item standard rule chains includes: determining a second preset number of feature screening methods, and calculating the ratio of the second preset number to a preset value to obtain a method number threshold; for each candidate rule chain in the P candidate rule chains, determining a screening result of each feature screening method on the current candidate rule chain to obtain a screening result set, wherein the screening result comprises approval and disapproval, the screening result indicates that the importance evaluation value of the approval representing the current candidate rule chain is greater than or equal to an importance evaluation value threshold, and the screening result indicates that the importance evaluation value of the disapproval representing the current candidate rule chain is smaller than the importance evaluation value threshold; determining the number of screening results which are approved from the screening result set to obtain the number of target screening results; under the condition that the number of the target screening results is greater than or equal to a method number threshold, determining the current candidate rule chain as a pending rule chain; and optimizing all rule chains to be determined through a genetic algorithm to obtain the Q-item standard rule chain.
Specifically, the second preset number may be 3, and the feature screening method is, for example, lasso (Least Absolute Shrinkage and Selection Operator, minimum absolute value contraction and selection operator, a machine learning algorithm for feature selection and regression analysis), ridge (a regularization method in a linear regression model), and chi-square test, and fig. 3 is a schematic diagram of screening a candidate rule chain by the feature screening method according to an embodiment of the present application, as shown in fig. 3, r 1 ,r 2 ,r 3 ,…,r m Represents a candidate rule chain, y represents a screening result of each feature screening method, and S= { r i1 ,r i2 ,…,r ik And characterizing the screened undetermined rule chains, voting and integrating the screened screening results of each characteristic screening method, and removing rule chains with the number of votes smaller than half of the number of the methods, namely removing rule chains with the number of target screening results smaller than the threshold value of the number of the methods, so as to obtain a rule set of the undetermined rule chains after re-screening. And then optimizing the rule chain to be determined through a genetic algorithm to obtain a final Q-item standard rule chain. In the embodiment, the candidate rule chains are screened again through the feature screening method, so that the target rule chains with better prediction performance are obtained.
Optionally, in the risk prediction method for a transaction provided in the embodiment of the present application, optimizing all rule chains to be determined by a genetic algorithm, obtaining a Q-entry target rule chain includes: each undetermined rule chain is determined to be an individual, and all undetermined rule chains are determined to be an initial population; performing genetic operation on the initial population through a genetic algorithm to obtain a multi-generation population, wherein the genetic operation at least comprises one of the following steps: a selection operation, a crossover operation and a mutation operation; calculating the fitness function value of each generation of population, and determining the population corresponding to the fitness function value as a target population under the condition that the fitness function value is smaller than the fitness function threshold; and determining a Q-item target rule chain according to the rule chains contained in the target population.
Specifically, according to the oldham razor principle, the Q target rule chain optimized through the genetic algorithm has a better prediction effect, meanwhile, fewer target rule chains are provided, each target rule chain contains fewer constraint conditions, and if more than 20 constraint conditions exist in one target rule chain, the target rule is likely to be inapplicable. And the number of the regular chains contained in each generation of population and the constraint conditions contained in the regular chains to be determined are adjusted through genetic operation, and an fitness function is constructed by taking the minimum number of the regular chains and the constraint conditions as targets, so that when the fitness function value is smaller than the fitness function threshold, the population corresponding to the fitness function value is determined as the target population, and the Q item standard regular chains are determined according to the regular chains contained in the target population. In the embodiment, the undetermined rule chain is optimized through a genetic algorithm, so that the performance of the rule chain is improved.
For example, the initial population is regular chain 1: the age is greater than 60 years old and the number of transactions is greater than 20 near 3 days; rule chain 2: the number of night transactions is more than 5 in 3 days, the age is 30-50 years old and the password is modified in 7 days; … …, the optimized target population is rule chain 1: the number of transactions is greater than 20 in the last 3 days; rule chain 2: the number of night transactions is greater than 5 in the last 3 days and the password modifying operation is carried out in the last 7 days; … …. The auc value of each generation of population is estimated after the rule set formed by different condition chains (namely transaction data) is optimized, and meanwhile, the number of condition chains and the number of constraint conditions contained in the Q-item label rule chain are enabled to be minimum.
Optionally, in the risk prediction method for a transaction provided in the embodiment of the present application, a calculation formula of the fitness function value is as follows:
wherein min obj is an fitness function value, alpha is a preset constant, alpha is 0 to less than or equal to 1, num_con sub Num_con, which is the kind of transaction data contained in the target population s For the kind of transaction data contained in the initial population, num_rule sub Num_rule, the number of constraints in the target population s Auc as the number of constraints in the initial population sub The constraint condition is a characteristic value range which is required to be satisfied by the characteristic value of the transaction data for the performance evaluation value of the target population.
Specifically, the performance evaluation value is best screened by setting the calculation formula of the fitness function, and the number of the condition chains and the number of the constraint conditions contained reach the minimum target population.
According to another embodiment of the present application, there is further provided an automatic rule deriving method based on an integrated tree, and fig. 4 is a schematic diagram of the rule deriving method based on an integrated tree according to an embodiment of the present application. As shown in fig. 4, the method includes: the method comprises the steps of preprocessing input original data, dividing the data into training data and test data, training an integrated tree model based on the training data, extracting a condition chain from the trained integrated tree model, carrying out primary screening on the condition chain through accuracy and recall rate, screening the condition chain again through a feature screening method, solving an optimal rule set through a genetic algorithm, evaluating the classification effect of the rule set through the test data, and determining a final rule set according to the classification effect.
According to the embodiment, the rule set is automatically generated through the strong learning ability of the integrated tree model by an automatic deriving rule method based on the integrated tree, the rules are initially screened through screening indexes, the generated rule set is used as characteristic input, characteristic screening is performed by a multi-method, voting integration is performed on screening results, interaction among the rules is fully learned, and similar rules with the same effect are removed. Finally, solving the rule set by using a genetic algorithm, optimizing the effect of single rule in the rule set, and simultaneously considering the interaction effect among the rules, so that the screened rule set has a better classification effect. In addition, a genetic algorithm is introduced into rule automatic derivation, so that the optimized rule set has good classification effect, and meanwhile, the rule number and constraint condition number are minimized.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a risk prediction device for the transaction, and the risk prediction device for the transaction can be used for executing the risk prediction method for the transaction provided by the embodiment of the application. The following describes a risk prediction device for transactions provided in the embodiments of the present application.
Fig. 5 is a schematic diagram of a risk prediction apparatus for transactions provided according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
the first determining unit 10 is configured to determine N kinds of transaction data of a target transaction, and input the N kinds of transaction data into the integrated tree model to obtain a first preset number of decision trees, where N is a positive integer, and the integrated tree model is trained by multiple sets of training samples, where each set of training samples includes sample transaction data and the first preset number of sample decision trees;
the extracting unit 20 is configured to extract M rule chains from a first preset number of decision trees, and screen P candidate rule chains from the M rule chains by using accuracy and recall of the rule chains, where M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision tree, and each node represents a transaction data;
a screening unit 30, configured to screen the P candidate rule chains by using a feature screening method to obtain Q item standard rule chains, and determine a target number of target rule chains that the transaction data of the target transaction conforms to, where the feature screening method is a method for screening based on an importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer;
The second determining unit 40 is configured to determine that the target transaction is a risk transaction if the target number is equal to or greater than the rule chain number threshold.
According to the risk prediction device for the transaction, N transaction data of a target transaction are determined through the first determination unit 10, the N transaction data are input into the integrated tree model to obtain a first preset number of decision trees, wherein N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; the extracting unit 20 extracts M rule chains from a first preset number of decision trees, and screens P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is larger than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; the screening unit 30 screens the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determine the target number of the target rule chains which are consistent with the transaction data of the target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; the second determining unit 40 determines that the target transaction is a risk transaction when the target number is greater than or equal to the rule chain number threshold, solves the problem of low accuracy of the prediction result of whether the transaction is at risk due to inaccurate selection of transaction data in the related art, obtains a decision tree by inputting N transaction data into an integrated tree model, extracts rule chains from the decision tree, obtains a target rule chain by screening the extracted rule chains by an accuracy rate, a recall rate and a feature screening method, evaluates whether the target transaction is at risk according to the target rule chain, and further achieves the effect of improving the accuracy of the prediction result of whether the transaction is at risk.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the integrated tree model is obtained by: obtaining transaction data of each historical transaction to obtain a plurality of groups of sample transaction data, and inputting each group of sample transaction data into an initial integrated tree model to obtain a first preset number of sample decision trees corresponding to each group of sample transaction data; determining each group of sample transaction data and a sample decision tree corresponding to the sample transaction data as a group of training samples to obtain a plurality of groups of training samples; and performing repeated iterative training on the initial integrated tree model based on a plurality of groups of training samples to obtain an integrated tree model with a preset performance evaluation value greater than or equal to a performance evaluation value threshold, wherein the model parameters of the initial integrated tree model are updated in each iterative training.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the extraction unit 20 includes: the system comprises an acquisition module, a test sample collection module and a risk test module, wherein the acquisition module is used for acquiring a test sample collection, the test sample collection comprises a plurality of test samples, and each test sample comprises test transaction data and a risk test result; the prediction module is used for predicting the test sample set through the current rule chain for each rule chain in the M rule chains to obtain a risk prediction result of each test sample in the test sample set; the calculation module is used for calculating the accuracy and recall rate of each rule chain based on the risk prediction result and the risk test result; the first determining module is used for determining the rule chain corresponding to the accuracy rate and the recall rate as a candidate rule chain under the condition that the accuracy rate is larger than or equal to the accuracy rate threshold value and the recall rate is larger than or equal to the recall rate threshold value.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the prediction module includes: the first determining submodule is used for determining F types of first transaction data contained in the current rule chain and determining target constraint conditions of each type of first transaction data, wherein F is smaller than N, F is a positive integer, and the target constraint conditions are characteristic value ranges to which characteristic values of the first transaction data belong; the extraction submodule is used for extracting the characteristic value of the first transaction data from the test sample for each test sample to obtain a target characteristic value; the second determining submodule is used for determining that the risk prediction result of the test sample is risk transaction under the condition that the target characteristic values of the F types of first transaction data all meet the target constraint conditions; and the third determination submodule is used for determining that the risk prediction result of the test sample is a non-risk transaction under the condition that at least one target characteristic value of the first transaction data does not accord with the target constraint condition.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the calculation module includes: a fourth determining submodule, configured to determine a number of first test samples in the test sample set to obtain the first number, where the first test sample is a sample of risk transaction, and the risk test result and the risk prediction result are both risk test results; a fifth determining submodule, configured to determine a number of second test samples in the test sample set to obtain a second number, where the second test sample is a sample in which a risk test result is a risk transaction and a risk prediction result is a non-risk transaction; a sixth determining submodule, configured to determine a number of third test samples in the test sample set to obtain the third number, where the third test sample is a sample in which a risk test result is a non-risk transaction and a risk prediction result is a risk transaction; the first calculating sub-module is used for calculating the sum of the first quantity and the second quantity to obtain a first sum value, and calculating the ratio of the first quantity to the first sum value to obtain the accuracy; and the second calculation sub-module is used for calculating the sum of the first quantity and the third quantity to obtain a second sum value, and calculating the ratio of the first quantity to the second sum value to obtain the recall rate.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the screening unit 30 includes: the second determining module is used for determining a second preset number of feature screening methods and calculating the ratio of the second preset number to a preset value to obtain a method number threshold; the third determining module is used for determining a screening result of each feature screening method on the current candidate rule chain for each candidate rule chain in the P candidate rule chains to obtain a screening result set, wherein the screening result comprises approval and disapproval, the screening result indicates that the importance evaluation value of the candidate rule chain is greater than or equal to an importance evaluation value threshold, and the screening result indicates that the importance evaluation value of the candidate rule chain is less than the importance evaluation value threshold; a fourth determining module, configured to determine, from the screening result set, that the screening result is an approved number, and obtain a target screening result number; a fifth determining module, configured to determine, when the number of target screening results is greater than or equal to the method number threshold, the current candidate rule chain as a rule chain to be determined; and the optimizing module is used for optimizing all the rule chains to be determined through a genetic algorithm to obtain a Q-item standard rule chain.
Optionally, in the risk prediction apparatus for a transaction provided in the embodiment of the present application, the optimization module includes: a seventh determining submodule, configured to determine each pending rule chain as an individual, and determine all pending rule chains as an initial population; the execution submodule is used for executing genetic operation on the initial population through a genetic algorithm to obtain a multi-generation population, wherein the genetic operation at least comprises one of the following steps: a selection operation, a crossover operation and a mutation operation; the third computing sub-module is used for computing the fitness function value of each generation of population, and determining the population corresponding to the fitness function value as a target population under the condition that the fitness function value is smaller than the fitness function threshold; and the eighth determining submodule is used for determining a Q-item target rule chain according to the rule chains contained in the target population.
Optionally, in the risk prediction device for a transaction provided in the embodiment of the present application, a calculation formula of the fitness function value is as follows:
wherein minobj is the fitness function value, alpha is a preset constant, alpha is 0 to less than or equal to 1, num_con sub Num_con, which is the kind of transaction data contained in the target population s For the kind of transaction data contained in the initial population, num_rule sub Num_rule, the number of constraints in the target population s Auc as the number of constraints in the initial population sub The constraint condition is a characteristic value range which is required to be satisfied by the characteristic value of the transaction data for the performance evaluation value of the target population.
The risk prediction device for transaction includes a processor and a memory, wherein the first determining unit 10, the extracting unit 20, the screening unit 30, the second determining unit 40, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the accuracy of the prediction result of whether the transaction is at risk is improved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
Embodiments of the present invention provide a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a risk prediction method for a transaction.
The embodiment of the invention provides a processor, which is used for running a program, wherein the risk prediction method for executing transactions when the program runs.
Fig. 6 is a schematic diagram of an electronic device provided according to an embodiment of the present application. As shown in fig. 6, the electronic device 601 includes a processor, a memory, and a program stored on the memory and executable on the processor, and the processor implements the following steps when executing the program: n transaction data of target transactions are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determining the target number of target rule chains which are consistent with the transaction data of target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; and under the condition that the target number is greater than or equal to the rule chain number threshold, determining the target transaction as a risk transaction. The device herein may be a server, PC, PAD, cell phone, etc.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: n transaction data of target transactions are determined, the N transaction data are input into an integrated tree model, a first preset number of decision trees are obtained, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees; extracting M rule chains from a first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is greater than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data; screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains and determining the target number of target rule chains which are consistent with the transaction data of target transaction, wherein the feature screening method is a method for screening based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer; and under the condition that the target number is greater than or equal to the rule chain number threshold, determining the target transaction as a risk transaction.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A method of risk prediction for a transaction, comprising:
determining N kinds of transaction data of target transaction, and inputting the N kinds of transaction data into an integrated tree model to obtain a first preset number of decision trees, wherein N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees;
Extracting M rule chains from the first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and recall rate of the rule chains, wherein M and P are positive integers, M is larger than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data;
screening the P candidate rule chains by a feature screening method to obtain Q item standard rule chains, and determining the target number of target rule chains which are consistent with the transaction data of the target transaction, wherein the feature screening method is a screening method based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer;
and under the condition that the target number is greater than or equal to a rule chain number threshold, determining that the target transaction is a risk transaction.
2. The method of claim 1, wherein the integrated tree model is derived by:
obtaining transaction data of each historical transaction to obtain a plurality of groups of sample transaction data, and inputting each group of sample transaction data into an initial integrated tree model to obtain a first preset number of sample decision trees corresponding to each group of sample transaction data;
Determining each group of sample transaction data and a sample decision tree corresponding to the sample transaction data as a group of training samples to obtain a plurality of groups of training samples;
and performing repeated iterative training on the initial integrated tree model based on the plurality of groups of training samples to obtain an integrated tree model with a preset performance evaluation value greater than or equal to a performance evaluation value threshold, wherein the model parameters of the initial integrated tree model are updated in each iterative training.
3. The method of claim 1, wherein screening P candidate rule chains from the M rule chains by accuracy and recall of rule chains comprises:
obtaining a test sample set, wherein the test sample set comprises a plurality of test samples, and each test sample comprises test transaction data and risk test results;
for each rule chain in the M rule chains, predicting the test sample set through the current rule chain to obtain a risk prediction result of each test sample in the test sample set;
calculating the accuracy and recall rate of each rule chain based on the risk prediction result and the risk test result;
and under the condition that the accuracy rate is greater than or equal to an accuracy rate threshold value and the recall rate is greater than or equal to a recall rate threshold value, determining the rule chain corresponding to the accuracy rate and the recall rate as a candidate rule chain.
4. A method according to claim 3, wherein predicting the set of test samples by a current rule chain to obtain a risk prediction result for each test sample in the set of test samples comprises:
f first transaction data contained in a current rule chain are determined, and target constraint conditions of each first transaction data are determined, wherein F is smaller than N, F is a positive integer, and the target constraint conditions are characteristic value ranges of characteristic values of the first transaction data;
for each test sample, extracting the characteristic value of the first transaction data from the test sample to obtain a target characteristic value;
under the condition that the target characteristic values of the F kinds of first transaction data all meet the target constraint conditions, determining a risk prediction result of the test sample as risk transaction;
and determining that the risk prediction result of the test sample is a non-risk transaction under the condition that at least one target characteristic value of the first transaction data does not accord with the target constraint condition.
5. The method of claim 3, wherein calculating the accuracy and recall of each rule chain based on the risk prediction results and the risk test results comprises:
Determining the number of first test samples in the test sample set to obtain a first number, wherein the first test samples are samples of risk transactions, and the risk test results and the risk prediction results are both risk transactions;
determining the number of second test samples in the test sample set to obtain a second number, wherein the second test samples are samples of which risk test results are risk transactions and risk prediction results are non-risk transactions;
determining the number of third test samples in the test sample set to obtain a third number, wherein the third test samples are samples of which risk test results are non-risk transactions and risk prediction results are risk transactions;
calculating the sum of the first quantity and the second quantity to obtain a first sum value, and calculating the ratio of the first quantity to the first sum value to obtain the accuracy;
and calculating the sum of the first quantity and the third quantity to obtain a second sum value, and calculating the ratio of the first quantity to the second sum value to obtain the recall rate.
6. The method of claim 1, wherein screening the P candidate rule chains by a feature screening method to obtain Q candidate rule chains comprises:
Determining a second preset number of feature screening methods, and calculating the ratio of the second preset number to a preset value to obtain a method number threshold;
for each candidate rule chain in the P candidate rule chains, determining a screening result of each feature screening method on the current candidate rule chain to obtain a screening result set, wherein the screening result comprises approval and disapproval, the screening result indicates that an importance evaluation value of the current candidate rule chain is greater than or equal to an importance evaluation value threshold, and the screening result indicates that an importance evaluation value of the current candidate rule chain is less than the importance evaluation value threshold;
determining the number of screening results to be endorsed from the screening result set to obtain the target screening result number;
determining the current candidate rule chain as a pending rule chain under the condition that the number of the target screening results is greater than or equal to the method number threshold;
and optimizing all rule chains to be determined through a genetic algorithm to obtain the Q-item standard rule chain.
7. The method of claim 6, wherein optimizing all of the pending rule chains by a genetic algorithm to obtain Q-entry target rule chains comprises:
Each undetermined rule chain is determined to be an individual, and all undetermined rule chains are determined to be an initial population;
performing genetic operation on the initial population through a genetic algorithm to obtain a multi-generation population, wherein the genetic operation at least comprises one of the following steps: a selection operation, a crossover operation and a mutation operation;
calculating the fitness function value of each generation of population, and determining the population corresponding to the fitness function value as a target population under the condition that the fitness function value is smaller than a fitness function threshold;
and determining the Q item target rule chain according to the rule chains contained in the target population.
8. The method of claim 7, wherein the fitness function value is calculated as:
wherein minobj is the fitness function value, alpha is a preset constant, alpha is 0 to less than or equal to 1, num_con sub Num_con, which is the kind of transaction data contained in the target population s For the kind of transaction data contained in the initial population, num_rule sub For the number of constraints in the target population, num_rule s Auc for the number of constraints in the initial population sub And (3) as the performance evaluation value of the target population, the constraint condition is a characteristic value range which is required to be met by the characteristic value of the transaction data.
9. A risk prediction apparatus for a transaction, comprising:
the system comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining N kinds of transaction data of target transaction, inputting the N kinds of transaction data into an integrated tree model to obtain a first preset number of decision trees, N is a positive integer, the integrated tree model is trained by a plurality of groups of training samples, and each group of training samples comprises sample transaction data and the first preset number of sample decision trees;
the extraction unit is used for extracting M rule chains from the first preset number of decision trees, and screening P candidate rule chains from the M rule chains through the accuracy rate and the recall rate of the rule chains, wherein M and P are positive integers, M is larger than P, the rule chains are chains formed by all nodes from root nodes to leaf nodes in the decision trees, and each node represents transaction data;
the screening unit is used for screening the P candidate rule chains through a characteristic screening method to obtain Q item standard rule chains and determining the target number of the target rule chains which are consistent with the transaction data of the target transaction, wherein the characteristic screening method is a screening method based on the importance evaluation value of the transaction data in the candidate rule chains, Q is smaller than P, and Q is a positive integer;
And the second determining unit is used for determining that the target transaction is a risk transaction under the condition that the target number is greater than or equal to a rule chain number threshold.
10. A non-volatile storage medium, characterized in that the non-volatile storage medium comprises a stored program, wherein the program, when run, controls a device in which the non-volatile storage medium is located to perform the risk prediction method of a transaction according to any one of claims 1 to 8.
CN202311632611.3A 2023-11-30 2023-11-30 Risk prediction method, apparatus and storage medium for transaction Pending CN117649236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311632611.3A CN117649236A (en) 2023-11-30 2023-11-30 Risk prediction method, apparatus and storage medium for transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311632611.3A CN117649236A (en) 2023-11-30 2023-11-30 Risk prediction method, apparatus and storage medium for transaction

Publications (1)

Publication Number Publication Date
CN117649236A true CN117649236A (en) 2024-03-05

Family

ID=90046023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311632611.3A Pending CN117649236A (en) 2023-11-30 2023-11-30 Risk prediction method, apparatus and storage medium for transaction

Country Status (1)

Country Link
CN (1) CN117649236A (en)

Similar Documents

Publication Publication Date Title
CN105718490A (en) Method and device for updating classifying model
CN110866819A (en) Automatic credit scoring card generation method based on meta-learning
CN107194803A (en) A kind of P2P nets borrow the device of borrower's assessing credit risks
CN110837963A (en) Risk control platform construction method based on data, model and strategy
CN110930198A (en) Electric energy substitution potential prediction method and system based on random forest, storage medium and computer equipment
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN113901977A (en) Deep learning-based power consumer electricity stealing identification method and system
CN111222994A (en) Client risk assessment method, device, medium and electronic equipment
CN111986027A (en) Abnormal transaction processing method and device based on artificial intelligence
CN111199469A (en) User payment model generation method and device and electronic equipment
CN110956277A (en) Interactive iterative modeling system and method
CN112884569A (en) Credit assessment model training method, device and equipment
CN116911994B (en) External trade risk early warning system
CN111352926A (en) Data processing method, device, equipment and readable storage medium
CN116205355B (en) Power load prediction method, device and storage medium
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN116342255A (en) Internet consumption credit anti-fraud risk identification method and system
CN116091206A (en) Credit evaluation method, credit evaluation device, electronic equipment and storage medium
CN117649236A (en) Risk prediction method, apparatus and storage medium for transaction
CN115619539A (en) Pre-loan risk evaluation method and device
CN111145066A (en) Method and system for determining urban physical sign portrait based on infinite hierarchical data structure
CN111612626A (en) Method and device for preprocessing bond evaluation data
CN114339859B (en) Method and device for identifying WiFi potential users of full-house wireless network and electronic equipment
CN116258574B (en) Mixed effect logistic regression-based default rate prediction method and system
CN112836926B (en) Enterprise operation condition evaluation method based on electric power big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination