CN112907254A - Fraud transaction identification and model training method, device, equipment and storage medium - Google Patents

Fraud transaction identification and model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN112907254A
CN112907254A CN202110320251.8A CN202110320251A CN112907254A CN 112907254 A CN112907254 A CN 112907254A CN 202110320251 A CN202110320251 A CN 202110320251A CN 112907254 A CN112907254 A CN 112907254A
Authority
CN
China
Prior art keywords
data
transaction
current
decision tree
tree model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110320251.8A
Other languages
Chinese (zh)
Inventor
黄文琳
张志辉
胡平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202110320251.8A priority Critical patent/CN112907254A/en
Publication of CN112907254A publication Critical patent/CN112907254A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud

Abstract

The embodiment of the invention provides a method, a device, equipment and a storage medium for identifying and training a fraud transaction and a model, and relates to the technical field of artificial intelligence, wherein the method for identifying the fraud transaction comprises the following steps: obtaining historical dispute transaction data and user return visit information of a card in current dispute transaction data, and extracting characteristic data from the historical dispute transaction data and the user return visit information; selecting target feature data from the feature data; inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction; and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing. The technical scheme provided by the embodiment of the invention can increase the classification basis and improve the identification efficiency and the identification accuracy.

Description

Fraud transaction identification and model training method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for identifying fraudulent transactions and training models.
Background
With the development of society and science and technology, information is common, and resource sharing is a requirement for the development of the times. Banks play a vital role as an important member of social development. Along with the interconnection of bank services, the number of card types is increased, the number of transaction channels is continuously increased, and the transaction condition is increasingly complicated and fussy. The financial industry such as banks is wider and wider in service range in the process of digital transformation, and the possibility of fraudulent behaviors exists in each link.
The fraudulent transactions are often hidden in the dispute transactions, and the dispute transaction data set also becomes the final centralized place of most of the missed fraudulent transactions. The method is an important link for preventing bank cheating and is an important barrier for maintaining equitable and harmonious transaction in the banking industry.
In order to avoid fraudulent transactions, disputed transactions are often analyzed by adopting a manual means, so that the efficiency is low, the subjectivity is strong, and the identification accuracy is low.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for fraud transaction recognition and model training, which can increase classification bases and improve recognition efficiency and recognition accuracy.
In a first aspect, an embodiment of the present invention provides a fraudulent transaction identification method, including:
obtaining historical dispute transaction data and user return visit information of a card in current dispute transaction data, and extracting characteristic data from the historical dispute transaction data and the user return visit information;
selecting target feature data from the feature data;
inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
In a second aspect, an embodiment of the present invention further provides a model training method, including:
acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the data into a database, and dividing the data in the database into a training data set and a test data set;
determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
In a third aspect, an embodiment of the present invention further provides a fraudulent transaction identification device, including:
the acquisition module is used for acquiring historical dispute transaction data and user return visit information of a card in current dispute transaction data and extracting characteristic data from the historical dispute transaction data and the user return visit transaction information;
a selection module for selecting target feature data from the feature data;
the classification module is used for inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and the reporting module is used for reporting the current dispute transaction for anti-fraud processing if the type of the current dispute transaction is suspected fraud transaction.
In a fourth aspect, an embodiment of the present invention provides a model training apparatus, including:
the system comprises a dividing module, a database and a data processing module, wherein the dividing module is used for acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the label data into the database, and dividing the data in the database into a training data set and a test data set;
the training module is used for determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
and the testing and optimizing module is used for determining target characteristic data in the testing data set, testing the initial decision tree model through the current dispute transaction data in the testing data set and the target characteristic data determined by the testing data set, and optimizing the initial decision tree model based on a testing result to obtain a final decision tree model.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods provided by the embodiments of the present invention.
In a sixth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method provided by the present invention.
According to the technical scheme provided by the embodiment of the invention, the characteristic data is extracted from the extracted historical dispute transaction data and the user return visit information, the target characteristic data is selected, the current dispute transaction data and the target characteristic data are input into the decision tree model to obtain the category of the current dispute transaction, if the category of the current dispute transaction is suspected fraud transaction, the current dispute transaction is reported for anti-fraud processing, the classification basis can be increased, and the identification efficiency and the identification accuracy can be improved.
Drawings
FIG. 1 is a flow chart of a method for identifying fraudulent transactions according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for identifying fraudulent transactions according to an embodiment of the present invention;
FIG. 3 is a flow chart of a model training method according to an embodiment of the present invention;
FIG. 4a is a flow chart of a fraudulent transaction identification method according to an embodiment of the present invention;
FIG. 4b is a flow chart of a fraudulent transaction identification method according to an embodiment of the present invention;
FIG. 4c is a flow chart of a method for identifying fraudulent transactions according to an embodiment of the present invention;
fig. 5a is a block diagram of a fraudulent transaction identification device according to an embodiment of the present invention;
fig. 5b is a block diagram of a fraudulent transaction identification device according to an embodiment of the present invention;
FIG. 6 is a block diagram of a model training apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a flowchart of a method for identifying fraudulent transactions according to an embodiment of the present invention, where the method may be performed by a fraudulent transaction identification apparatus, where the apparatus may be implemented by software and/or hardware, the apparatus may be configured in an electronic device such as a server, a terminal, etc., and the method may be applied in a scenario of identifying a disputed transaction of a bank.
As shown in fig. 1, the technical solution provided by the embodiment of the present invention includes:
s110: historical dispute transaction data and user return visit information of a card in current dispute transaction data are obtained, and feature data are extracted from the historical dispute transaction data and the user return visit information.
In embodiments of the present invention, the dispute transaction data includes card number, consumption information, and the like of the card. The consumption information includes consumption amount information, identification information of the merchant of consumption, fraud records of the merchant of consumption and the like. Wherein the user return visit information is questionnaire result data for the current dispute transaction.
Extracting characteristic data from historical dispute transaction data, wherein the characteristic data comprises whether target merchants consume in the current dispute transaction data and whether fraud records exist in the target merchants; the characteristic data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost. The characteristic data extracted from the historical dispute transaction data may also include transaction doubtful information (transaction currency, transaction amount, etc.), whether the transaction is successful but deducted, whether the non-refund is cancelled for the transaction, etc. The characteristic data extracted from the user return visit information can also comprise whether the card is lent to other people for use and the like.
S120: target feature data is selected among the feature data.
In this embodiment of the present invention, optionally, the determining the target feature data in the feature data includes: target feature data is selected among the feature data based on the kini coefficient. The kini coefficient is a common international index for measuring the income gap of residents in a country or a region. The coefficient of the kini is between 0 and 1, and the larger the coefficient of the kini is, the higher the inequality is. The larger the kini coefficient is, the larger the uncertainty of the characteristic data is, and the characteristic data can be screened by calculating the numerical value of the kini coefficient, so that the target characteristic data is obtained.
S130: and inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction.
In the embodiment of the present invention, the decision tree model includes a classification tree model, which is a very common classification method and is supervised learning, that is, given a large number of samples, each sample has a group of attributes and a class label, where the class label can be predetermined, and a classifier is obtained through learning, and the classifier can provide correct classification for a newly appearing object.
In this embodiment of the present invention, before inputting the current dispute transaction data and the target feature data into the decision tree model, optionally, the method may include: training the decision tree model through a training data set in a database to obtain an initial decision tree model, testing the initial decision tree model through a testing data set in the database, and optimizing the decision tree model based on a testing result to obtain a final decision tree model.
Specifically, an external interface can be called to obtain a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information for the current dispute transaction, and tag data of whether the current dispute transaction is a fraud transaction, and the tag data is analyzed to a database, and the data is subjected to data processing such as preprocessing, data cleaning, standardization processing, complete data supplement and the like. And randomly divided into training data sets and testing data sets in an 8:2 ratio.
Optionally, determining target feature data in the training data set, inputting current dispute transaction data, target feature data and tag data in the training data set into the decision tree model, and training the decision tree model to obtain an initial decision tree model; and determining target characteristic data in the test data set, and testing and optimizing the initial decision tree model through the current dispute transaction data and the target characteristic data in the test data set to obtain a final decision tree model.
In this embodiment of the present invention, optionally, a data set formed by the historical dispute transaction data and the data of whether the historical dispute transaction is the result of the fraudulent transaction may be referred to as a historical dispute transaction reported fraudulent transaction data set. Optionally, the user information corresponding to the current dispute transaction may be analyzed to a database, and the user information may be divided into a training data set and a testing data set.
It should be noted that all data sets of dispute transactions, which exist in the card (bank card) consumption process, such as inconsistency of multi-party accounts and inconsistency of processing flows, may be referred to as dispute transaction data sets, where the data sets include dispute transaction data (card number, amount, etc.), transaction data sets of historical dispute transactions reporting fraud, user-related data (client information, user return visit information for current dispute transaction, etc.)
In the embodiment of the present invention, optionally, before training the decision tree model, the method may further include constructing the decision tree model according to the target feature data, where the decision tree model is a tree structure, and a node of each tree structure may be one piece of target feature data.
In the embodiment of the present invention, optionally, the current dispute transaction data and the selected target feature data are input into the final decision tree model, so that the category of the current dispute transaction can be obtained. The category of the dispute transaction may include a general dispute transaction and a suspected fraud transaction.
S140: and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
In the embodiment of the invention, anti-fraud processing is a service for identifying fraud behaviors including transaction fraud, application fraud, card stealing and number stealing, phishing, money laundering and the like, so that whether a disputed transaction is a fraudulent transaction or not is identified and is an essential part of banking business.
In the embodiment of the invention, if the type of the current dispute transaction is suspected fraud transaction, the current dispute transaction is reported to the upper node, and the counter-fraud judgment is carried out on the current dispute transaction through the upper node, so that whether the current dispute transaction is fraud transaction is judged.
In the related art, the dispute service processing refers to the work of the exchange with disputes in the multi-party accounting, the processing flow and the like. The characteristics of inconsistent multi-party accounts and processing flows of disputed transaction data are often utilized by fraudulent transactions and used as masking interest devices of the fraudulent transactions, so that the fraudulent transactions are often hidden in the disputed transactions, and the disputed transaction data also become the final centralized part of missed fraudulent transactions.
In the related art, the dispute transaction is identified and whether anti-fraud processing should be reported is mainly judged by manual means according to own experience. The judgment basis is single (such as whether the client approves the transaction), the subjectivity of people is mixed, the scientificity of the identification result is not strong, and the accuracy is low.
In the related technology, whether a dispute transaction with doubt is reported as an anti-fraud service or not is mainly analyzed by a manual means, a user needs to know a specific scene of the transaction and then judge, and the method is large in workload, low in efficiency and strong in subjectivity. If the identification is wrong, the subsequent circulation processing of the doubtful dispute transaction is wrong, the processing period is long, the resources are mismatched, and the user problem cannot be solved most quickly, so that the user can be helped to recover the loss and the user experience is influenced.
According to the technical scheme provided by the embodiment of the invention, the characteristic data is extracted from the extracted historical dispute transaction data and the user return visit information, the target characteristic data is selected, the current dispute transaction data and the target characteristic data are input into the decision tree model to obtain the category of the current dispute transaction, if the category of the current dispute transaction is suspected fraud transaction, the current dispute transaction is reported for anti-fraud processing, the classification basis can be increased, the identification efficiency and accuracy can be improved, the condition of manual identification can be avoided, the efficiency can be improved, the problem of quickly solving the user can be avoided, and the user experience can be improved.
Fig. 2 is a flowchart of a fraudulent transaction identification method according to an embodiment of the present invention, and in this embodiment, optionally, the method according to the embodiment of the present invention may include:
and training the decision tree model through a training data set in the database to obtain an initial decision tree model.
And testing the initial decision tree model through the test data set in the database, and optimizing the decision tree model based on the test result to obtain a final decision tree model.
Optionally, the method may further include:
and feeding back the anti-fraud processing result of the current dispute transaction to the database, and updating the database.
Optionally, the method may further include:
retraining the decision tree model by using the updated training data set in the database;
and testing and optimizing the retrained decision tree model by adopting the updated test data set in the data database.
As shown in fig. 2, the technical solution provided by the embodiment of the present invention includes:
s210: and training the decision tree model through a training data set in the database to obtain an initial decision tree model.
S220: and testing the initial decision tree model through the test data set in the database, and optimizing the decision tree model based on the test result to obtain a final decision tree model.
S230: historical dispute transaction data and user return visit information of a card in current dispute transaction data are obtained, and feature data are extracted from the historical dispute transaction data and the user return visit information.
S240: target feature data is selected among the feature data.
S250: and inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction.
S260: and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
In the embodiment of the present invention, reference may be made to the description of the above embodiment for the description of S210-S260.
S270: and feeding back the anti-fraud processing result of the current dispute transaction to the database, and updating the database.
In embodiments of the present invention, the result of the anti-fraud processing of the current disputed transaction is whether the current disputed transaction is a fraudulent transaction, for example, whether the current disputed transaction is a fraudulent transaction or whether the current disputed transaction is a normal disputed transaction. The anti-fraud processing result of the current dispute transaction is fed back to the database, the data in the database can be updated, and whether the current dispute transaction is the data of the fraud transaction result or not is determined.
S280: and retraining the decision tree model by adopting the updated training data set in the database.
In the embodiment of the invention, due to the development of the technology, the cheating means technology is continuously updated, and after the decision tree model is used for a period of time, the decision tree model needs to be retrained periodically in order not to influence the accuracy of the decision tree model. Optionally, the data in the updated database is divided into a training data set and a testing data set, and the decision tree model is retrained through the testing data set. Wherein the retraining process is the same as the training process described in the above steps.
S290: and testing and optimizing the retrained decision tree model by adopting the updated test data set in the data database.
In the embodiment of the present invention, after the database is updated, the test data set is also updated again, and the retrained decision tree model is tested and optimized through the updated test data set, where the testing and optimizing process may refer to the description of the above embodiment.
Therefore, the database is updated, the decision tree model is retrained, the model can be dynamically updated, the model is more time-efficient, and the recognition result is more timely.
Fig. 3 is a flowchart of a model training method according to an embodiment of the present invention, where the method may be performed by a model training apparatus, where the apparatus may be implemented by software and/or hardware, the apparatus may be configured in an electronic device such as a terminal, a server, and the like, and the method may be applied in a scenario of training a decision tree model.
As shown in fig. 3, the technical solution provided by the embodiment of the present invention includes:
s310: acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the label data into a database, and dividing the data in the database into a training data set and a testing data set.
In the embodiment of the invention, a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction and label data of whether the current dispute transaction is a fraud transaction are obtained and analyzed to a database, and the data are subjected to data processing such as preprocessing, data cleaning, standardization processing, complete data supplement and the like. And randomly divided into training data sets and testing data sets in an 8:2 ratio.
In an embodiment of the present invention, the current dispute transaction data includes a card number of the card and an amount of the card to be consumed. Optionally, the feature data extracted from the historical dispute transaction data includes: whether a target merchant in the current dispute transaction data consumes and whether a fraud record exists for the target merchant; the feature data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost.
In the embodiment of the present invention, a data set formed by historical dispute transaction data and data indicating whether a historical dispute transaction is a fraud transaction result may be referred to as a historical dispute transaction reported fraud transaction data set, and optionally, user information corresponding to a current dispute transaction may be analyzed to a database, and the user information may be divided into a training data set and a test data set.
S320: and determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model.
In this embodiment of the present invention, optionally, the determining the target feature data in the data set includes: extracting characteristic data from historical dispute transaction data in a data set and the user return visit information; selecting target feature data based on the kini coefficient among the feature data. For example, the above embodiment may be referred to specifically as determining target feature data in the training data set, that is, extracting feature data from the historical dispute transaction data and the user return visit information in the training data set, and selecting the target feature data based on the kini coefficient.
In an implementation manner of the embodiment of the present invention, optionally, after determining the target feature data, determining whether the target feature data is complete, and if not, supplementing the target feature data. Specifically, the process may return to S310 again to supplement the data in the database.
S330: determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
In this embodiment of the present invention, optionally, determining the target feature data in the test data set may include: extracting characteristic data from historical dispute transaction data in the test data set and the user return visit information; the target feature data is selected based on the kini coefficient in the feature data, and the specific description of selecting the target feature data in the feature data may refer to the description of the above embodiment.
According to the technical scheme provided by the embodiment of the invention, by acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction and label data of whether the current dispute transaction is a fraud transaction, analyzing the label data into a database, dividing the data in the database into a training data set and a testing data set, and training, testing and optimizing a decision tree model through the training data set and the testing data set, the judgment basis of model classification can be increased, and the accuracy of model identification can be improved.
Fig. 4a is a flowchart of a fraudulent transaction identification method according to an embodiment of the present invention, and in the embodiment of the present invention, as shown in fig. 4a, a technical solution according to the embodiment of the present invention includes:
s410: acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the label data into a database, and dividing the data in the database into a training data set and a testing data set.
S420: determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
s430: determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
S440: historical dispute transaction data and user return visit information of a card in current dispute transaction data are obtained, and feature data are extracted from the historical dispute transaction data and the user return visit information.
S450: target feature data is selected among the feature data.
S460: and inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction.
S470: and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
Reference is made to the description of the embodiments above.
Fig. 4b and 4c may be referred to in the method provided in the embodiment of the present invention, where as shown in fig. 4b and 4c, the technical solution provided in the embodiment of the present invention includes the following steps:
the method comprises the following steps: data collection and warehousing: the data needed by modeling comprises dispute transaction data (which can be called dispute transaction record data including card numbers, money and the like), transaction data sets of historical dispute transactions reporting fraud, user data (user information and the results of the transaction questionnaires) and data sets of labels corresponding to each group of data, and the data sets are analyzed and put in storage. Wherein the questionnaire result is the user return visit information.
Step two: data preprocessing: data processing such as data cleaning, standardization processing and data supplement integrity. Randomly according to the following steps of 8: and 2, dividing the ratio into a training data set and a testing data set (wherein the training data set is used for training the decision tree model, and the testing data set is used for testing the accuracy and the tuning of the decision tree model).
Step three: selecting characteristic data: and selecting characteristic data based on the Gini coefficient, and in the process of selecting the characteristic data, checking whether the data is complete again, and supplementing the required characteristic data.
Step four: and (3) perfecting feature selection: and (4) making a preliminary judgment on the selected characteristic data, determining whether the data characteristic data is complete, and if the data characteristic data is incomplete, returning to the data collection of the first step to continue to collect the data, and performing auditing and supplementing on the data set.
Step five: constructing a decision tree model: a tree model is constructed based on the features.
Step six: model training: and training the decision tree model by utilizing the training data set according to the characteristics.
Step seven: testing and optimizing the model: and testing and optimizing the model by using the test data set in an optimization mode such as pruning, parameter adjustment and the like.
Step eight: and (3) classifying the models: the current disputed transaction is identified using a decision tree model to find fraudulent transactions hidden in ordinary disputed transactions.
Step nine: dispute reporting fraud: and reporting the identified suspected fraud transaction to anti-fraud processing.
Step ten: updating the database: and updating a data set of the transaction to update the database for retraining a decision tree model in a later period.
Step eleven: retraining the model: and (4) periodically utilizing a new data set, retraining the model and increasing the effectiveness of the model pair.
Fig. 5a is a block diagram of a fraudulent transaction identification device according to an embodiment of the present invention, and as shown in fig. 5a, the fraudulent transaction identification device includes: an acquisition module 510, a selection module 520, a classification module 530 and a reporting module 540.
The obtaining module 510 is configured to obtain historical dispute transaction data of a card in current dispute transaction data and user return visit information, and extract feature data from the historical dispute transaction data and the user return visit transaction information;
a selection module 520, configured to select target feature data from the feature data;
a classification module 530, configured to input the current dispute transaction data and the target feature data into a decision tree model to obtain a category of the current dispute transaction;
a reporting module 540, configured to report the current disputed transaction for anti-fraud processing if the type of the current disputed transaction is suspected fraud.
Optionally, the feature data extracted from the historical dispute transaction data includes: whether a target merchant in the current dispute transaction data consumes and whether a fraud record exists for the target merchant;
the feature data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost.
Optionally, the apparatus further comprises a training module, configured to:
training the decision tree model through a training data set in a database to obtain an initial decision tree model
And testing the initial decision tree model through the test data set in the database, and optimizing the decision tree model based on the test result to obtain a final decision tree model.
Optionally, the determining target feature data in the feature data includes:
target feature data is selected among the feature data based on the kini coefficient.
Optionally, the apparatus further includes an update module, configured to:
and feeding back the anti-fraud processing result of the current dispute transaction to the database, and updating the database.
Optionally, the apparatus further includes a retraining module, configured to:
retraining the decision tree model by using the updated training data set in the database;
and testing and optimizing the retrained decision tree model by adopting the updated test data set in the database.
The structure of the apparatus provided in the embodiment of the present invention is merely an example, and may also be in other structural forms, for example, the apparatus provided in the embodiment of the present invention may also be in the structural form shown in fig. 5 b.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 6 is a block diagram of a model training apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus according to the embodiment of the present invention includes a partitioning module 610, a training module 620, and a testing and optimizing module 630.
The dividing module 610 is configured to obtain multiple sets of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether a historical dispute transaction is a fraud transaction result, user return visit information for the current dispute transaction, and tag data of whether the current dispute transaction is a fraud transaction, analyze the data into a database, and divide data in the database into a training data set and a test data set;
a training module 620, configured to determine target feature data in the training data set, input current dispute transaction data in the training data set, the target feature data determined by the training data set, and the tag data into a decision tree model, and train the decision tree model to obtain an initial decision tree model;
a testing and optimizing module 630, configured to determine target feature data in the test data set, test the initial decision tree model according to the current dispute transaction data in the test data set and the target feature data determined by the test data set, and optimize the initial decision tree model based on a test result to obtain a final decision tree model.
Optionally, determining target feature data in the data set includes:
extracting characteristic data from historical dispute transaction data in a data set and the user return visit information;
selecting target feature data based on the kini coefficient among the feature data.
Optionally, the apparatus further comprises a supplementary module, configured to:
judging whether the target characteristic data is complete or not;
and if not, supplementing the target characteristic data.
Optionally, the current dispute transaction data includes a card number of the card and a spending amount of the card.
Optionally, the feature data extracted from the historical dispute transaction data includes: whether a target merchant in the current dispute transaction data consumes and whether a fraud record exists for the target merchant;
the feature data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device includes:
one or more processors 710, one processor 710 being illustrated in FIG. 7;
a memory 720;
the apparatus may further include: an input device 730 and an output device 740.
The processor 710, the memory 720, the input device 730 and the output device 740 of the apparatus may be connected by a bus or other means, for example, in fig. 7.
The memory 720, which is a non-transitory computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a fraudulent transaction identification method in the embodiment of the present invention (for example, the obtaining module 510, the selecting module 520, the classifying module 530, and the reporting module 540 shown in fig. 5 a), or program instructions/modules corresponding to a model training method in the embodiment of the present invention (for example, the dividing module 610, the training module 620, and the testing and optimizing module 630 shown in fig. 6). The processor 710 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 720, so as to implement a fraudulent transaction method of the above-mentioned method embodiment, that is:
obtaining historical dispute transaction data and user return visit information of a card in current dispute transaction data, and extracting characteristic data from the historical dispute transaction data and the user return visit information;
selecting target feature data from the feature data;
inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
Or a model training method for implementing the above method embodiment, namely:
acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the data into a database, and dividing the data in the database into a training data set and a test data set;
determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
The memory 720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 720 may optionally include memory located remotely from processor 710, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 740 may include a display device such as a display screen, or an output interface.
Embodiments of the present invention provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a fraudulent transaction identification method as provided by embodiments of the present invention:
obtaining historical dispute transaction data and user return visit information of a card in current dispute transaction data, and extracting characteristic data from the historical dispute transaction data and the user return visit information;
selecting target feature data from the feature data;
inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
Or a model training method for implementing the above method embodiment, namely:
acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the data into a database, and dividing the data in the database into a training data set and a test data set;
determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (15)

1. A method of identifying fraudulent transactions, comprising:
obtaining historical dispute transaction data and user return visit information of a card in current dispute transaction data, and extracting characteristic data from the historical dispute transaction data and the user return visit information;
selecting target feature data from the feature data;
inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and if the type of the current disputed transaction is suspected to be a fraud transaction, reporting the current disputed transaction for anti-fraud processing.
2. The method of claim 1,
the feature data extracted from the historical dispute transaction data comprises: whether a target merchant in the current dispute transaction data consumes and whether a fraud record exists for the target merchant;
the feature data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost.
3. The method of claim 1, further comprising:
training the decision tree model through a training data set in a database to obtain an initial decision tree model
And testing the initial decision tree model through the test data set in the database, and optimizing the decision tree model based on the test result to obtain a final decision tree model.
4. The method of claim 1, wherein determining target feature data in the feature data comprises:
target feature data is selected among the feature data based on the kini coefficient.
5. The method of claim 3, further comprising:
and feeding back the anti-fraud processing result of the current dispute transaction to the database, and updating the database.
6. The method of claim 5, further comprising:
retraining the decision tree model by using the updated training data set in the database;
and testing and optimizing the retrained decision tree model by adopting the updated test data set in the database.
7. A method of model training, comprising:
acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the data into a database, and dividing the data in the database into a training data set and a test data set;
determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
determining target characteristic data in the test data set, testing the initial decision tree model through current dispute transaction data in the test data set and the target characteristic data determined by the test data set, and optimizing the initial decision tree model based on a test result to obtain a final decision tree model.
8. The method of claim 7, wherein determining target feature data in a data set comprises:
extracting characteristic data from historical dispute transaction data in a data set and the user return visit information;
selecting target feature data based on the kini coefficient among the feature data.
9. The method of claim 7, further comprising:
judging whether the target characteristic data is complete or not;
and if not, supplementing the target characteristic data.
10. The method of claim 7 wherein the current dispute transaction data includes a card number of the card and a spending amount of the card.
11. The method of claim 8,
the feature data extracted from the historical dispute transaction data includes: whether a target merchant in the current dispute transaction data consumes and whether a fraud record exists for the target merchant;
the feature data extracted from the user return visit information comprises: whether the user denies the current dispute transaction, whether the card in the current dispute transaction data is lost, and whether the card in the current dispute transaction data is lost.
12. A fraudulent transaction identification arrangement comprising:
the acquisition module is used for acquiring historical dispute transaction data and user return visit information of a card in current dispute transaction data and extracting characteristic data from the historical dispute transaction data and the user return visit transaction information;
a selection module for selecting target feature data from the feature data;
the classification module is used for inputting the current dispute transaction data and the target characteristic data into a decision tree model to obtain the category of the current dispute transaction;
and the reporting module is used for reporting the current dispute transaction for anti-fraud processing if the type of the current dispute transaction is suspected fraud transaction.
13. A model training apparatus, comprising:
the system comprises a dividing module, a database and a data processing module, wherein the dividing module is used for acquiring a plurality of groups of current dispute transaction data, historical dispute transaction data of a card in the current dispute transaction data, data of whether the historical dispute transaction is a fraud transaction result, user return visit information aiming at the current dispute transaction, and label data of whether the current dispute transaction is a fraud transaction, analyzing the label data into the database, and dividing the data in the database into a training data set and a test data set;
the training module is used for determining target characteristic data in the training data set, inputting current dispute transaction data in the training data set, the target characteristic data determined by the training data set and the label data into a decision tree model, and training the decision tree model to obtain an initial decision tree model;
and the testing and optimizing module is used for determining target characteristic data in the testing data set, testing the initial decision tree model through the current dispute transaction data in the testing data set and the target characteristic data determined by the testing data set, and optimizing the initial decision tree model based on a testing result to obtain a final decision tree model.
14. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.
15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-11.
CN202110320251.8A 2021-03-25 2021-03-25 Fraud transaction identification and model training method, device, equipment and storage medium Pending CN112907254A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110320251.8A CN112907254A (en) 2021-03-25 2021-03-25 Fraud transaction identification and model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110320251.8A CN112907254A (en) 2021-03-25 2021-03-25 Fraud transaction identification and model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112907254A true CN112907254A (en) 2021-06-04

Family

ID=76106446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110320251.8A Pending CN112907254A (en) 2021-03-25 2021-03-25 Fraud transaction identification and model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112907254A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745289A (en) * 2022-04-19 2022-07-12 中国联合网络通信集团有限公司 Method, device, storage medium and equipment for predicting network performance data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745289A (en) * 2022-04-19 2022-07-12 中国联合网络通信集团有限公司 Method, device, storage medium and equipment for predicting network performance data

Similar Documents

Publication Publication Date Title
CN109816397B (en) Fraud discrimination method, device and storage medium
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN111461216A (en) Case risk identification method based on machine learning
CN111666346A (en) Information merging method, transaction query method, device, computer and storage medium
CN110827036A (en) Method, device, equipment and storage medium for detecting fraudulent transactions
CN112783513B (en) Code risk checking method, device and equipment
CN112581271B (en) Merchant transaction risk monitoring method, device, equipment and storage medium
CN112907254A (en) Fraud transaction identification and model training method, device, equipment and storage medium
CN114493619A (en) Enterprise credit investigation label construction method based on electric power data
CN110910241B (en) Cash flow evaluation method, apparatus, server device and storage medium
CN115204322B (en) Behavior link abnormity identification method and device
CN110570301B (en) Risk identification method, device, equipment and medium
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN114493853A (en) Credit rating evaluation method, credit rating evaluation device, electronic device and storage medium
CN112416800A (en) Intelligent contract testing method, device, equipment and storage medium
CN114185807A (en) Test data management method and device, computer equipment and storage medium
CN113537960A (en) Method, device and equipment for determining abnormal resource transfer link
Wang et al. Temporal transaction information-aware Ponzi scheme detection for ethereum smart contracts
CN111984798A (en) Atlas data preprocessing method and device
CN111429257A (en) Transaction monitoring method and device
Sapozhnikova et al. Distributed infrastructure for big data processing in the transaction monitoring systems
CN116777606B (en) Intelligent anti-fraud system and method for bank credit based on relation graph
Sapozhnikova et al. Processing of big data in the transaction monitoring systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination