CN108629413B - Neural network model training and transaction behavior risk identification method and device - Google Patents

Neural network model training and transaction behavior risk identification method and device Download PDF

Info

Publication number
CN108629413B
CN108629413B CN201710153115.8A CN201710153115A CN108629413B CN 108629413 B CN108629413 B CN 108629413B CN 201710153115 A CN201710153115 A CN 201710153115A CN 108629413 B CN108629413 B CN 108629413B
Authority
CN
China
Prior art keywords
sample data
sample
gbdt
path information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710153115.8A
Other languages
Chinese (zh)
Other versions
CN108629413A (en
Inventor
李龙飞
周俊
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201710153115.8A priority Critical patent/CN108629413B/en
Priority to TW106140070A priority patent/TWI689874B/en
Priority to PCT/CN2018/078906 priority patent/WO2018166457A1/en
Publication of CN108629413A publication Critical patent/CN108629413A/en
Application granted granted Critical
Publication of CN108629413B publication Critical patent/CN108629413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The application relates to the technical field of computers, in particular to a neural network model training and transaction behavior risk identification method and device. And training the neural network model according to the path information and the sample label corresponding to each sample data in the GBDT. That is, in the present application, path information is determined according to the GBDT, and then the neural network model is trained according to the path information and the sample label, but according to the characteristics of the GBDT itself, one piece of path information usually includes information of multiple dimensions in the sample data, so that the efficiency of training the neural network model can be improved.

Description

Neural network model training and transaction behavior risk identification method and device
Technical Field
The application relates to the technical field of computers, in particular to a neural network model training and transaction behavior risk identification method and device.
Background
In the conventional technology, after sample data is collected, a neural network model is trained directly according to the sample data and a sample label of the sample data. However, the sample data gathered as described above typically includes information in multiple dimensions, which may result in inefficient training of the neural network model.
Disclosure of Invention
The application describes a neural network model training and transaction behavior risk identification method and device, which can improve the efficiency of neural network model training.
In a first aspect, a neural network model training method is provided, including:
inputting a plurality of pre-collected sample data into a gradient lifting decision tree (GBDT) to determine path information corresponding to each sample data in the GBDT; each sample data has a corresponding sample tag;
and training a neural network model according to the path information and the sample label corresponding to each sample data in the GBDT.
In a second aspect, a transaction behavior risk identification method is provided, including:
acquiring transaction behavior data of a user;
inputting the transaction behavior data into a Gradient Boost Decision Tree (GBDT) to determine corresponding path information of the transaction behavior data in the GBDT;
inputting the path information into a neural network model;
and outputting a transaction behavior risk identification result.
In a third aspect, a neural network model training apparatus is provided, including:
the system comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for inputting a plurality of pre-collected sample data into a gradient lifting decision tree (GBDT) so as to determine the path information corresponding to each sample data in the GBDT; each sample data has a corresponding sample tag;
and the training unit is used for training the neural network model according to the path information and the sample label corresponding to each sample data in the GBDT determined by the determining unit.
In a fourth aspect, a transaction behavior risk identification device is provided, which includes:
the acquisition unit is used for acquiring transaction behavior data of a user;
the determining unit is used for inputting the transaction behavior data acquired by the acquiring unit into a Gradient Boost Decision Tree (GBDT) so as to determine path information corresponding to the transaction behavior data in the GBDT;
an input unit configured to input the path information determined by the determination unit into a neural network model;
and the output unit is used for outputting the transaction behavior risk identification result.
According to the neural network model training and transaction behavior risk identification method and device, a plurality of pre-collected sample data are input into a gradient boost decision tree GBDT to determine path information corresponding to each sample data in the GBDT. And training the neural network model according to the path information and the sample label corresponding to each sample data in the GBDT. That is, in the present application, path information is determined according to the GBDT, and then the neural network model is trained according to the path information and the sample label, but according to the characteristics of the GBDT itself, one piece of path information usually includes information of multiple dimensions in the sample data, so that the efficiency of training the neural network model can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a neural network model training method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a decision tree provided herein;
FIG. 3 is a schematic diagram of a DNN training process provided herein;
FIG. 4 is a schematic diagram of a transaction behavior risk identification method provided in the present application;
FIG. 5 is a schematic diagram of a neural network model training apparatus according to an embodiment of the present application;
fig. 6 is a schematic view of a transaction behavior risk identification device according to another embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
The Neural Network model training method provided by the embodiment of the application is suitable for a scene for training Neural Network models such as Deep Neural Networks (DNNs) or Artificial Neural Networks (ANNs). The trained neural network model can be used for carrying out pattern recognition and classification scenes, for example, risk recognition on transaction behaviors can be carried out.
Fig. 1 is a flowchart of a neural network model training method according to an embodiment of the present application. The execution subject of the method may be a device with processing capabilities: as shown in fig. 1, the method specifically includes:
step 110, inputting a plurality of pre-collected sample data into a Gradient Boosting Decision Tree (GBDT) to determine path information corresponding to each sample data in the GBDT.
The GBDT model may be trained prior to performing step 110. The specific training process is described subsequently.
In step 110, taking a scenario in which the trained neural network model is used for transaction behavior risk identification as an example, the sample data may be transaction behavior data of the user. Specifically, sample data can be collected from a background database of the payment system. Here, the sample data may be attributed to the following five categories of user data: 1) historical behavior information of the user. E.g., a, the number of incoming calls made by the user over several days (e.g., 180 days); b, logging in the city for the last time; c, logging in the time from the current time for the last time; d, number of logins over several days (e.g., 90 days), etc. 2) Transaction information of the user. E.g., a, average payment amount over several days (e.g., 90 days); b, days of payment over several days (e.g., 180 days); c, amount paid over several days (e.g., 180 days); d, last payment time to present, etc. 3) Basic information of the user. E.g., a, whether the user is single; b, whether the user is decorated or not; c, whether the user is married or not; d, user age; e, the registration time of the user; f, user education level, etc. 4) Remote Procedure Call (RPC) behavior information of the user. The RPC behavior information here refers to an RPC call between the client and the server when the user uses the client. In one implementation, the operations for each user in the last given time window may be gathered. For example, a variable of the number of times the user accessed the RPC interface in the last 2 days may be gathered. 5) User's Uniform Resource Locator (URL) address information.
For the collected multiple sample data, if certain sample data is not relevant to the current user or the sample data can bring negative influence to the user, the sample data is classified as positive sample data. If a certain transaction behavior is operated by a non-user or a certain loss and a report are brought to the account of the user, the transaction behavior data is marked as positive sample data. Otherwise, if the certain sample data is the normal transaction behavior data of the user, marking the sample data as negative sample data.
It should be noted that, in general, negative sample data is easier to collect. For example, data for normal payment behavior can be easily gathered from a background database of the payment system. Therefore, the sample data set with negative sample data accounts for most of the sample data set, e.g., more than 99.999%. However, when the proportion of the negative sample data is high, the trained neural network model often has a bias, for example, only safe transaction behaviors can be identified, but risky transaction behaviors cannot be identified, which affects the accuracy of the transaction behavior risk identification.
In order to improve the accuracy of the transaction behavior risk identification, sample data can be preprocessed. In one implementation, upsampling may be performed on the positive sample data; and/or performing down-sampling processing on the negative sample data. The upsampling processing on the positive sample data may include: the number of positive sample data is increased by means of copying or the like. The downsampling processing of the negative sample data may include: the number of negative sample data is reduced by deletion or the like. In one example, the ratio of positive to negative sample data may be adjusted to 1: 300.
It should be noted that, for the sample data after the above preprocessing, corresponding sample tags may be added to the positive and negative sample data. Specifically, a positive sample label is added to the positive sample data, and a negative sample label is added to the negative sample data.
In step 110, inputting a plurality of pre-collected sample data into the GBDT may specifically include: for each sample data, feature values corresponding to a plurality of features may be determined according to the sample data. The eigenvalues of the features are then input into the decision tree of the GBDT.
Features herein may be attributed to multiple categories. In one implementation, some of the above features may adopt model variables deposited on line by an existing transaction behavior risk identification model, and the model variables belong to the following three categories: 1) historical behavior information of the user. 2) Transaction information of the user. 3) Basic information of the user.
However, the model variables need to be determined according to business data, and the business data usually come from different business departments, and a certain time is needed for collection and arrangement, so that the latest state of the user cannot be obtained only through the model variables, and the latest transaction behavior of the user cannot be subjected to risk identification. In order to solve the problem, the method and the device have the advantages that the characteristics of RPC behavior information belonging to the user and the characteristics of URL address information belonging to the user are added.
In summary, the present application may be characterized by features belonging to the following five categories: 1) historical behavior information of the user. 2) Transaction information of the user. 3) Basic information of the user. 4) RPC behavior information of the user. 5) URL address information of the user. Wherein, each category is as described above, and is not repeated herein.
For the set characteristics, after the corresponding characteristic value is determined according to specific sample data, the characteristic value can be input into the GBDT. A GBDT here may consist of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a feature. Taking a decision tree as an example, the decision tree can be shown in fig. 2, node 1, node 2 and node 3 are respectively associated with features: "whether the user is a male sex", "the user is older than 20 years" and "whether the transaction amount exceeds 1000 dollars". After the feature values of the features are input into the decision tree, a plurality of pieces of path information can be determined in the decision tree. For example, if the sample data includes the data about the male sex, the age of the user is greater than 20 years, and the transaction amount exceeds 1000 yuan, the determined path information may be as shown by a bold line in fig. 2.
As an exemplary illustration, only one piece of path information is shown in fig. 2, and when sample data is actually input into the GBDT, a plurality of pieces of path information may be determined, which is not repeated herein.
In the present application, before the eigenvalue is input to the GBDT, the eigenvalue may be expressed as an one-hot eigenvector. In the case where the eigenvector corresponding to the eigenvalue is also determined, the above-mentioned inputting of the eigenvalue into the GBDT may be replaced by: and inputting the feature vector corresponding to the feature value into the decision tree to determine corresponding path information. The process of determining the feature vector of the feature value may be as follows:
taking the feature as "gender of the user" as an example, if the gender of the user is male, that is, the feature value of the feature is "male", the feature vector corresponding to the feature value may be: [01]. If the gender of the user is female, that is, the feature value of the feature is "female", the feature vector corresponding to the feature value may be: [10].
Taking the characteristic as the RPC behavior information of the user as an example, the determination of the eigenvector corresponding to the eigenvalue can be realized by the following two ways: in a first implementation, firstly, rules are set: if the value is over, the value is marked as 1, otherwise, the value is 0. Specifically, the preset RPC behavior information is assumed to be: a, b and c. And a certain sample data contains RPC behavior information of the user within two days as follows: a, a and b, i.e. the eigenvalues are: a, a and b. The corresponding feature vector may be: [110]. In another implementation, rules may be set: and counting the frequency of the preset RPC behavior information, and then normalizing. Specifically, the preset RPC behavior information is assumed to be: a, b and c. And a certain sample data contains RPC behavior information of the user within two days as follows: a, a, b, b and c, i.e. the eigenvalues are: a, a, b, b and c. The corresponding feature vector may be: 2,2 and 1. Because normalization is required, the final feature vector is: [0.40.40.2].
It should be noted that, the above description of representing the feature values as feature vectors belongs to the conventional technology, and the description thereof is omitted here.
It should be noted that, in order to improve the accuracy of the neural network model, a relatively large number of features are set in the present application, so that a plurality of feature values are determined. For more and more characteristic values, the processing of the characteristic values is time-consuming and limited by the number of the characteristic values observed at the same time, and it is difficult for people to deeply analyze the relationship among the characteristic values and manually generate new characteristic values. In the present application, the path information is obtained by inputting the sample data into the GBDT, and the path information includes a plurality of eigenvalues. The number of characteristic values can thus be greatly reduced, whereby manual effort can be significantly reduced.
And 120, training the neural network model according to the path information corresponding to each sample data and the sample label.
The neural network model herein may include DNN or ANN, etc. Among them, DNN has been developed rapidly in recent years, and compared with the traditionally used shallow model (e.g., Logistic Regression (LR), Random Forest (RF)), DNN has its own advancement: the model expression capability is strong, and the method is suitable for big data and distributed training. Therefore, in this specification, DNN training is described as an example.
In the present application, the DNN training process may be as shown in fig. 3, where in fig. 3, an input layer of the DNN is used to input each piece of path information in the GBDT, and an output layer may output the first prediction result. It is understood that, for each sample data, i.e. after inputting the path information corresponding to the sample data into the DNN, the DNN outputs the corresponding first prediction result. For a plurality of sample data in the sample set, if the probability that the first prediction result matches the sample label of the sample data reaches a preset threshold, where the preset threshold may be set according to an empirical value, it may be considered that the optimized DNN has been obtained.
It is understood that the number of DNN layers in fig. 3 may be varied according to the number of path information.
Through experimental invention, the neural network model obtained by training of the application has better effect than other models (LR or RF). Meanwhile, the time for feature processing is greatly reduced, and the overall modeling process is much faster.
The following describes how to train the GBDT model:
after determining the eigenvalues corresponding to the plurality of features from each sample data, the eigenvalues corresponding to the plurality of features may be input into the respective decision trees of the GBDT. The conclusions of the decision trees are then accumulated to determine a second prediction. It will be appreciated that for each sample data, the GBDT model outputs a corresponding second prediction. For a plurality of sample data in the sample set, if the probability that the second prediction result matches the sample label of the sample data reaches a preset threshold, where the preset threshold may be set according to an empirical value, it may be considered that the optimized GBDT model has been obtained. If the probability that the second prediction result is matched with the sample label of the sample data does not reach the preset threshold, the input and output operations can be continuously executed by adjusting the number of the decision trees, the depth of the decision trees and the regularization term (used for representing the characteristics) until the preset threshold is reached.
In summary, the present application has the following advantages:
1) because the characteristics of the application comprise the characteristics of the RPC behavior information of the user, the neural network model trained by the application can meet the timeliness requirement, and the latest transaction behavior of the user can be identified.
2) The accuracy of the neural network model trained by the application is higher than that of the traditional shallow model.
3) By inputting sample data into the GBDT, path information is obtained. One piece of path information is formed by combining a plurality of characteristic values, namely one piece of path information contains information of a plurality of dimensions of sample data, so that the data size input by the DNN input layer can be greatly reduced, and the training efficiency of the neural network model can be improved.
It should be noted that after the neural network model is obtained through the training of the steps shown in fig. 1, the neural network model can be deployed on line, and risk identification is performed on the transaction behavior of the user.
Fig. 4 is a process schematic diagram of the transaction behavior risk identification method provided in the present application. As shown in fig. 4, the method may include:
at step 410, transaction behavior data of the user is obtained.
The transaction behavior data is defined as the sample data, and the description thereof is omitted here.
Step 420, inputting the transaction behavior data into the gradient boosting decision tree GBDT to determine corresponding path information of the transaction behavior data in the GBDT.
The GBDT is composed of a plurality of decision trees, each decision tree including a plurality of nodes, each node corresponding to a feature. The step of inputting the transaction behavior data into the gradient boost decision tree GBDT in step 420 to determine the path information corresponding to the transaction behavior data in the GBDT may specifically include: determining characteristic values corresponding to the plurality of characteristics according to the transaction behavior data; and determining path information in the decision tree according to the characteristic value. The process of determining the path information may refer to fig. 2, which is not repeated herein.
Step 430, inputting the path information into the neural network model.
I.e. the path information determined in step 420 is input into the input layer of the DNN.
And step 440, outputting a transaction behavior risk identification result.
Specifically, the transaction behavior risk identification result is output by an output layer of the DNN. Here, an alarm may be initiated if the result of the identification is a risky transaction activity. In a payment scenario, if the recognition result is a risky payment behavior, the user account may be frozen to prevent property loss. Correspondingly to the neural network model training method, an embodiment of the present invention further provides a neural network model training apparatus, as shown in fig. 5, the apparatus includes:
the determining unit 501 is configured to input a plurality of pre-collected sample data into the gradient boost decision tree GBDT to determine path information corresponding to each sample data in the GBDT.
Here, each sample data has a corresponding sample label.
The training unit 502 is configured to train the neural network model according to the path information and the sample label corresponding to each sample data determined by the determining unit 501 in the GBDT.
Optionally, the GBDT consists of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a feature.
The determining unit 501 is specifically configured to:
and determining the characteristic values corresponding to the plurality of characteristics according to the sample data for each sample data in the plurality of sample data.
Here, the features may include: the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
And determining path information in the decision tree according to the characteristic value.
Optionally, the sample tag may include: a positive swatch label and a negative swatch label. The above apparatus may further include:
the processing unit 503 is configured to perform upsampling processing on sample data with a sample label being a positive sample label; and/or the presence of a gas in the gas,
and performing down-sampling processing on the sample data with the sample label as the negative sample label.
The functions of the functional modules of the device in the embodiment of the present application may be implemented through the steps in the method embodiment described above, and therefore, the specific working process of the device provided in the present application is not repeated herein.
In the neural network model training device provided by the present application, the determining unit 501 inputs a plurality of sample data collected in advance into the gradient boost decision tree GBDT to determine path information corresponding to each sample data in the GBDT. The training unit 502 trains the neural network model according to the path information and the sample label corresponding to each sample data in the GBDT. Therefore, the efficiency of training the neural network model can be improved.
Corresponding to the above transaction behavior risk identification method, an embodiment of the present application further provides a transaction behavior risk identification device, as shown in fig. 6, the device includes:
the obtaining unit 601 is configured to obtain transaction behavior data of a user.
A determining unit 602, configured to input the transaction behavior data acquired by the acquiring unit 601 into the gradient boost decision tree GBDT to determine path information corresponding to the transaction behavior data in the GBDT.
An input unit 603, configured to input the path information determined by the determination unit 602 into the neural network model.
And an output unit 604, configured to output a transaction behavior risk identification result.
Optionally, the GBDT consists of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a feature;
the determining unit 602 is specifically configured to:
and determining characteristic values corresponding to the plurality of characteristics according to the transaction behavior data.
And determining path information in the decision tree according to the characteristic value.
Wherein the features may include: the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
The functions of the functional modules of the device in the embodiment of the present application may be implemented through the steps in the method embodiment described above, and therefore, the specific working process of the device provided in the present application is not repeated herein.
The transaction behavior risk identification device can improve efficiency and accuracy of transaction behavior risk identification.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (14)

1. A neural network model training method in a transaction risk recognition scene is characterized by comprising the following steps:
inputting a plurality of characteristic values of each sample data in a plurality of pre-collected sample data corresponding to each predetermined characteristic into a pre-trained Gradient Boost Decision Tree (GBDT) to determine a plurality of pieces of path information of each sample data corresponding to the GBDT; each piece of path information comprises a plurality of characteristic values in a plurality of characteristic values corresponding to the sample data, and the number of the path information corresponding to each sample data is less than that of the characteristic values corresponding to the sample data; each sample data has a corresponding sample tag; the sample data refers to transaction behavior data of the user;
and training a neural network model according to the plurality of pieces of path information and the sample labels of each sample data corresponding to the GBDT.
2. The method according to claim 1, wherein the GBDT consists of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a predetermined characteristic.
3. The method of claim 1 or 2, wherein the sample label comprises: a positive swatch label and a negative swatch label;
before inputting a plurality of feature values corresponding to each predetermined feature in a plurality of pre-collected sample data into the gradient boost decision tree GBDT, the method further includes:
performing up-sampling processing on sample data with a sample label as a positive sample label; and/or the presence of a gas in the gas,
and performing down-sampling processing on the sample data with the sample label as the negative sample label.
4. The method of claim 2, wherein the predetermined characteristic comprises:
the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
5. A transaction behavior risk identification method, comprising:
acquiring transaction behavior data of a user;
inputting a plurality of characteristic values of the transaction behavior data corresponding to each predetermined characteristic into a Gradient Boost Decision Tree (GBDT) to determine a plurality of pieces of path information corresponding to the transaction behavior data in the GBDT; each path information comprises a plurality of characteristic values corresponding to the transaction behavior data; the number of the path information corresponding to the transaction behavior data is less than the number of the characteristic values corresponding to the transaction behavior data;
inputting the plurality of pieces of path information into a neural network model;
and outputting a transaction behavior risk identification result.
6. The method according to claim 5, wherein the GBDT is formed from a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a predetermined characteristic.
7. The method of claim 6, wherein the predetermined characteristic comprises:
the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
8. A neural network model training device in a transaction risk recognition scene is characterized by comprising the following components:
the determining unit is used for inputting a plurality of characteristic values of each sample data in a plurality of pre-collected sample data corresponding to each predetermined characteristic into a pre-trained Gradient Boost Decision Tree (GBDT) so as to determine a plurality of pieces of path information of each sample data corresponding to the GBDT; each piece of path information comprises a plurality of characteristic values in a plurality of characteristic values corresponding to the sample data, and the number of the path information corresponding to each sample data is less than that of the characteristic values corresponding to the sample data; each sample data has a corresponding sample tag; the sample data refers to transaction behavior data of the user;
and the training unit is used for training the neural network model according to the plurality of pieces of path information and the sample labels, corresponding to each sample data in the GBDT, determined by the determining unit.
9. The apparatus of claim 8 wherein the GBDT consists of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a predetermined characteristic.
10. The apparatus of claim 8 or 9, wherein the sample label comprises: a positive swatch label and a negative swatch label; the device further comprises:
the processing unit is used for performing up-sampling processing on the sample data with the sample label being the positive sample label; and/or the presence of a gas in the gas,
and performing down-sampling processing on the sample data with the sample label as the negative sample label.
11. The apparatus of claim 9, wherein the predetermined characteristic comprises:
the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
12. A transaction behavior risk identification device, comprising:
the acquisition unit is used for acquiring transaction behavior data of a user;
a determining unit, configured to input a plurality of feature values, corresponding to each predetermined feature, of the transaction behavior data acquired by the acquiring unit into a gradient boost decision tree GBDT, so as to determine a plurality of pieces of path information corresponding to the transaction behavior data in the GBDT; each path information comprises a plurality of characteristic values corresponding to the transaction behavior data; the number of the path information corresponding to the transaction behavior data is less than the number of the characteristic values corresponding to the transaction behavior data;
an input unit configured to input the plurality of pieces of path information determined by the determination unit into a neural network model;
and the output unit is used for outputting the transaction behavior risk identification result.
13. The apparatus of claim 12 wherein the GBDT consists of a plurality of decision trees, each decision tree comprising a plurality of nodes, each node corresponding to a predetermined characteristic.
14. The apparatus of claim 13, wherein the predetermined characteristic comprises:
the remote procedure of the user calls RPC behavior information and/or Uniform Resource Locator (URL) address information of the user.
CN201710153115.8A 2017-03-15 2017-03-15 Neural network model training and transaction behavior risk identification method and device Active CN108629413B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201710153115.8A CN108629413B (en) 2017-03-15 2017-03-15 Neural network model training and transaction behavior risk identification method and device
TW106140070A TWI689874B (en) 2017-03-15 2017-11-20 Method and device for neural network model training and transaction behavior risk identification
PCT/CN2018/078906 WO2018166457A1 (en) 2017-03-15 2018-03-14 Neural network model training method and device, transaction behavior risk identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710153115.8A CN108629413B (en) 2017-03-15 2017-03-15 Neural network model training and transaction behavior risk identification method and device

Publications (2)

Publication Number Publication Date
CN108629413A CN108629413A (en) 2018-10-09
CN108629413B true CN108629413B (en) 2020-06-16

Family

ID=63522791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710153115.8A Active CN108629413B (en) 2017-03-15 2017-03-15 Neural network model training and transaction behavior risk identification method and device

Country Status (3)

Country Link
CN (1) CN108629413B (en)
TW (1) TWI689874B (en)
WO (1) WO2018166457A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389494B (en) * 2018-10-25 2021-11-05 北京芯盾时代科技有限公司 Loan fraud detection model training method, loan fraud detection method and device
CN109615454A (en) * 2018-10-30 2019-04-12 阿里巴巴集团控股有限公司 Determine the method and device of user's finance default risk
CN109583475B (en) * 2018-11-02 2023-06-30 创新先进技术有限公司 Abnormal information monitoring method and device
CN110046179B (en) * 2018-12-25 2023-09-08 创新先进技术有限公司 Mining method, device and equipment for alarm dimension
CN109559232A (en) * 2019-01-03 2019-04-02 深圳壹账通智能科技有限公司 Transaction data processing method, device, computer equipment and storage medium
CN109784403B (en) * 2019-01-16 2022-07-05 武汉斗鱼鱼乐网络科技有限公司 Method for identifying risk equipment and related equipment
CN110033092B (en) * 2019-01-31 2020-06-02 阿里巴巴集团控股有限公司 Data label generation method, data label training device, event recognition method and event recognition device
CN110008349B (en) * 2019-02-01 2020-11-10 创新先进技术有限公司 Computer-implemented method and apparatus for event risk assessment
CN111667290A (en) * 2019-03-08 2020-09-15 北京京东尚科信息技术有限公司 Business display method and device and computer readable storage medium
CN110232400A (en) * 2019-04-30 2019-09-13 冶金自动化研究设计院 A kind of gradient promotion decision neural network classification prediction technique
CN110390041B (en) * 2019-07-02 2022-05-20 上海上湖信息技术有限公司 Online learning method and device and computer readable storage medium
CN110942248B (en) * 2019-11-26 2022-05-31 支付宝(杭州)信息技术有限公司 Training method and device for transaction wind control network and transaction risk detection method
CN111290922B (en) * 2020-03-03 2023-08-22 中国工商银行股份有限公司 Service operation health monitoring method and device
CN111291900A (en) * 2020-03-05 2020-06-16 支付宝(杭州)信息技术有限公司 Method and device for training risk recognition model
CN111723083B (en) * 2020-06-23 2024-04-05 北京思特奇信息技术股份有限公司 User identity recognition method and device, electronic equipment and storage medium
CN111667028B (en) * 2020-07-09 2024-03-12 腾讯科技(深圳)有限公司 Reliable negative sample determination method and related device
CN111931690A (en) * 2020-08-28 2020-11-13 Oppo广东移动通信有限公司 Model training method, device, equipment and storage medium
CN112161173B (en) * 2020-09-10 2022-05-13 国网河北省电力有限公司检修分公司 Power grid wiring parameter detection device and detection method
CN112667940B (en) * 2020-10-15 2022-02-18 广东电子工业研究院有限公司 Webpage text extraction method based on deep learning
CN112541076B (en) * 2020-11-09 2024-03-29 北京百度网讯科技有限公司 Method and device for generating expanded corpus in target field and electronic equipment
CN113610354A (en) * 2021-07-15 2021-11-05 北京淇瑀信息科技有限公司 Policy distribution method and device for third-party platform user and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279691A (en) * 2014-07-25 2016-01-27 中国银联股份有限公司 Financial transaction detection method and equipment based on random forest model
CN105844501A (en) * 2016-05-18 2016-08-10 上海亿保健康管理有限公司 Consumption behavior risk control system and method
CN105975992A (en) * 2016-05-18 2016-09-28 天津大学 Unbalanced data classification method based on adaptive upsampling
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
CN106447333A (en) * 2016-11-29 2017-02-22 中国银联股份有限公司 Fraudulent trading detection method and server
CN106506454A (en) * 2016-10-10 2017-03-15 江苏通付盾科技有限公司 Fraud business recognition method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890803B (en) * 2011-07-21 2016-01-06 阿里巴巴集团控股有限公司 The defining method of the abnormal process of exchange of electronic goods and device thereof
US20130054417A1 (en) * 2011-08-30 2013-02-28 Qualcomm Incorporated Methods and systems aggregating micropayments in a mobile device
CN106296195A (en) * 2015-05-29 2017-01-04 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279691A (en) * 2014-07-25 2016-01-27 中国银联股份有限公司 Financial transaction detection method and equipment based on random forest model
CN105844501A (en) * 2016-05-18 2016-08-10 上海亿保健康管理有限公司 Consumption behavior risk control system and method
CN105975992A (en) * 2016-05-18 2016-09-28 天津大学 Unbalanced data classification method based on adaptive upsampling
CN106096727A (en) * 2016-06-02 2016-11-09 腾讯科技(深圳)有限公司 A kind of network model based on machine learning building method and device
CN106506454A (en) * 2016-10-10 2017-03-15 江苏通付盾科技有限公司 Fraud business recognition method and device
CN106447333A (en) * 2016-11-29 2017-02-22 中国银联股份有限公司 Fraudulent trading detection method and server

Also Published As

Publication number Publication date
WO2018166457A1 (en) 2018-09-20
TW201835819A (en) 2018-10-01
CN108629413A (en) 2018-10-09
TWI689874B (en) 2020-04-01

Similar Documents

Publication Publication Date Title
CN108629413B (en) Neural network model training and transaction behavior risk identification method and device
CN113011889B (en) Account anomaly identification method, system, device, equipment and medium
CN110310114B (en) Object classification method, device, server and storage medium
CN110147389B (en) Account processing method and device, storage medium and electronic device
CN109872162A (en) A kind of air control classifying identification method and system handling customer complaint information
CN109903053B (en) Anti-fraud method for behavior recognition based on sensor data
CN112329811A (en) Abnormal account identification method and device, computer equipment and storage medium
CN113706151A (en) Data processing method and device, computer equipment and storage medium
CN111144566A (en) Neural network weight parameter training method, characteristic classification method and corresponding device
CN111970400B (en) Crank call identification method and device
CN114117029B (en) Solution recommendation method and system based on multi-level information enhancement
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.
CN111160959A (en) User click conversion estimation method and device
CN116865994A (en) Network data security prediction method based on big data
CN116633589A (en) Malicious account detection method, device and storage medium in social network
CN115994331A (en) Message sorting method and device based on decision tree
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN114998002A (en) Risk operation prediction method and device
CN115146292A (en) Tree model construction method and device, electronic equipment and storage medium
CN115099344A (en) Model training method and device, user portrait generation method and device, and equipment
CN115130473A (en) Key information extraction method, model training method, related device and electronic equipment
CN113947195A (en) Model determination method and device, electronic equipment and memory
CN116629388B (en) Differential privacy federal learning training method, device and computer readable storage medium
CN114389834B (en) Method, device, equipment and product for identifying abnormal call of API gateway
CN108520042B (en) System and method for realizing suspect case-involved role calibration and role evaluation in detection work

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20191225

Address after: P.O. Box 31119, grand exhibition hall, hibiscus street, 802 West Bay Road, Grand Cayman, Cayman Islands

Applicant after: Innovative advanced technology Co., Ltd

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Co., Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant