CN115600112A - Method, device, equipment and medium for obtaining behavior prediction model training set - Google Patents
Method, device, equipment and medium for obtaining behavior prediction model training set Download PDFInfo
- Publication number
- CN115600112A CN115600112A CN202211473454.1A CN202211473454A CN115600112A CN 115600112 A CN115600112 A CN 115600112A CN 202211473454 A CN202211473454 A CN 202211473454A CN 115600112 A CN115600112 A CN 115600112A
- Authority
- CN
- China
- Prior art keywords
- label
- user
- correlation
- behavior prediction
- training set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to a method, a device, equipment and a medium for acquiring a behavior prediction model training set, and aims to solve the technical problem that the training precision of a prediction model is low because commercial tenant transaction behaviors in the existing third-party payment industry are used as the training set. To this end, the method for obtaining the training set of the merchant behavior prediction model of the invention comprises the following steps: collecting historical user transaction flow data; acquiring research data from historical user transaction flow data; a training set of merchant behavior prediction models is determined based on historical user transaction flow data, the first feature values, and the first labels. Thus, a training set with high accuracy is obtained.
Description
Technical Field
The invention relates to the technical field of data processing, and particularly provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set.
Background
At present, in the existing machine learning model for user behavior analysis, classification and prediction are mainly performed on frequent and periodic repetitive behaviors, so that fixed time intervals such as natural days, weeks, months, quarters, years and the like are often used as division granularity to perform summary statistics on the time sequence information of users.
However, in reality, there is a repetitive behavior that exhibits some periodic behavior over a long period of time, but the period is not completely fixed, such as the behavior of a merchant transaction in the third party payment industry. When training a user behavior analysis/prediction model, if such a data set is used, the obtained features usually have data misalignment, high null value occupancy or coarse-grained statistics, and the problem of insufficient detail description is caused, so that the accuracy of the trained model is low.
Accordingly, there is a need in the art for a new approach to obtain a training set of merchant behavior prediction models to solve the above-mentioned problems.
Disclosure of Invention
The present invention has been made to overcome the above drawbacks, and aims to solve, or at least partially solve, the above technical problems. The invention provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set.
In a first aspect, the present invention provides a method of obtaining a training set of merchant behavior prediction models, the method comprising: collecting historical user transaction flow data; obtaining research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' comprises a rate, a transaction failure rate, an implement failure and a manager service; determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'.
In one embodiment, the historical user transaction flow data includes a set of user history information X = xi, i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y', including: determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis; acquiring an initial characteristic matrix X0 corresponding to the first label y'; determining the training sample according to the correlation between the set X of the user historical information and the first characteristic value X' and the initial characteristic matrix X0; and determining a second label y corresponding to the training sample based on the initial feature matrix X0.
In one embodiment, determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis includes: selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed; selecting any two columns from the set X of the user history information, calculating a correlation value between the any two columns and each of the first feature values X' respectively, traversing all columns in the set X of the user history information; by analogy, all columns are selected from the set X of the user historical information, and correlation values between all columns in the set X of the user historical information and each column in the first characteristic value X' are respectively calculated; and judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user historical information and a certain column in the first characteristic value X'.
In one embodiment, obtaining an initial feature matrix X0 corresponding to the first tag y' includes: determining a correlation between the first label y 'and the first feature value x'; and removing columns irrelevant to the first label y ' from the first characteristic value X ' based on the correlation between the first label y ' and the first characteristic value X ', so as to obtain an initial characteristic matrix X0 corresponding to the first label y '.
In one embodiment, determining the training sample according to the correlation between the set X of user history information and the first feature value X' and the initial feature matrix X0 includes: and extracting data corresponding to the initial characteristic matrix X0 from the set X of the user history information according to the correlation between the set X of the user history information and the first characteristic value X' to obtain the training sample.
In one embodiment, determining the second label y corresponding to the training sample based on the initial feature matrix X0 includes: performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix; determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y '; and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.
In one embodiment, obtaining research data from the historical user transaction flow data comprises: performing hierarchical sampling on the historical user transaction flow data by adopting a hierarchical sampling method to obtain hierarchical data; investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprise a first characteristic value x' and a third label; screening out a third label of the non-true reasons from the third labels; determining the first label y' based on an empirical model and the third label of the non-genuine cause to obtain the research data.
In a second aspect, the present invention provides an apparatus for obtaining a training set of merchant behavior prediction models, the apparatus comprising:
a collection module configured to collect historical user transaction flow data;
an obtaining module configured to obtain research data from the historical user transaction flow data, wherein the research data includes a first characteristic value x 'and a first label y', the first characteristic value x 'includes first user basic information and first user historical transaction information, and the first label y' includes a rate, a transaction failure rate, an implement failure and a housekeeping service;
a determination module configured to determine a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and a first label y'.
In a third aspect, an electronic device is provided, comprising a processor and a storage adapted to store a plurality of program codes, the program codes being adapted to be loaded and executed by the processor to perform the method of obtaining a training set of merchant behavior prediction models according to any of the preceding claims.
In a fourth aspect, a computer-readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and executed by a processor to perform the method for obtaining a training set of merchant behavior prediction models according to any one of the preceding claims.
One or more technical schemes of the invention at least have one or more of the following beneficial effects:
the method for obtaining the training set of the merchant behavior prediction model comprises the steps of firstly collecting historical user transaction running data, secondly obtaining research data from the historical user transaction running data, and finally determining the training set of the merchant behavior prediction model based on the historical user transaction running data, the first characteristic value x 'and the first label y'. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.
Drawings
The disclosure of the present invention will become more readily understood with reference to the accompanying drawings. As is readily understood by those skilled in the art: these drawings are for illustrative purposes only and are not intended to constitute a limitation on the scope of the present invention. Moreover, in the drawings, like numerals are used to indicate like parts, and in which:
FIG. 1 is a flow diagram illustrating the main steps of a method for obtaining a training set of merchant behavior prediction models according to one embodiment of the present invention;
FIG. 2 is a schematic diagram of a complete flow of a method of obtaining a training set of merchant behavior prediction models, according to one embodiment of the invention;
FIG. 3 is a block diagram illustrating the main structure of an apparatus for obtaining a training set of merchant behavior prediction models according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the structure of an electronic device in one embodiment.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, and may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer-readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and so forth. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one of A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
At present, the characteristics of the repeated behaviors which show a certain periodic behavior in a long period but the period is not completely fixed exist in reality, for example, the transaction behaviors of merchants in the third party payment industry have the characteristics. When the user behavior analysis/prediction model is trained, if the data set is used, the obtained features usually have data misalignment, high null value ratio or coarse-grained statistics, and the problem of insufficient detailed description is caused, so that the accuracy of the trained model is low.
Therefore, the application provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.
Referring to fig. 1, fig. 1 is a flow chart illustrating main steps of a method for obtaining a training set of a merchant behavior prediction model according to an embodiment of the present invention.
As shown in fig. 1, the method for obtaining a training set of a merchant behavior prediction model in the embodiment of the present invention mainly includes the following steps S101 to S103.
Step S101: historical user transaction flow data is collected.
The following description is given by taking the example of analyzing the behavior of the merchant in the third party payment industry to predict whether the current merchant has the risk of loss and the reason for loss.
Specifically, in the third party payment industry, there is one transaction flow data for each transaction, for example, there are ten million user transaction flow data, which can be used as a training set (massive time series data) of a merchant behavior prediction model. The historical user transaction flow data includes a set of user history information X = xi, i is a positive integer, xi includes second user basic information and second user historical transaction information, and the like, for example, 200 features are provided, and 200 columns are provided in the data table, which are denoted by X = [ X1, X2.
Step S102: and acquiring research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' represents specific reasons of user loss, including rate, transaction failure rate, tool failure and manager service.
In one embodiment, the obtaining of research data from the historical user transaction flow data may be implemented through steps S1021 to S1024 described below.
Step S1021: and carrying out hierarchical sampling on the historical user transaction flow data by adopting a hierarchical sampling method to obtain hierarchical data.
Specifically, the historical user transaction flow data may be subjected to customer stratification according to the crowd characteristics, where the customer stratification may be implemented by using an unsupervised learning machine learning algorithm, such as a clustering algorithm, on the historical user transaction flow data, specifically, descriptive statistics is performed on customers in the historical user transaction flow data, for example, according to age, region, historical transaction, POS machine registration duration, and the like, the customers in different layers only conform to normal distribution in a group to which the customers should belong, and features that each cluster conforms are obtained after clustering, so that the stratified data is obtained. E.g., tier 7, including, e.g., VIP users (required to meet revenue, transaction amount, etc.); long-term client (long-term use) a customer of a POS machine), etc. For example, each layer relates to 7000 clients.
Then, a hierarchical sampling method is adopted to extract part of users according to the online time interval, the transaction amount interval (accumulated transaction amount, maximum monthly transaction amount) and the like, for example, about 4.9 ten thousand users are finally extracted, which is specifically shown in table 1 below. Wherein the on-grid time interval may be:
the farther time: the time of registration, second brushing and last transaction is 2017.1.1-2018.6.30;
marking time: the time of registration, second brushing and last transaction is 2019.1.1-2020.6.30;
the closer time is as follows: the time of registration, two-time brushing, and the last transaction is 2020.7.1-2021.12.31.
Wherein the transaction amount interval may be:
high transaction amount: the monthly transaction amount reaches 7 ten thousand yuan during the online period, and the accumulated transaction amount is more than 24 ten thousand;
amount of the transaction: the monthly transaction amount reaches 4 ten thousand yuan during the online period, and the accumulated transaction amount is more than 24 ten thousand;
low transaction amount: the monthly transaction amount reaches 4 ten thousand yuan during the network period, and the accumulated transaction amount is less than 24 ten thousand.
Illustratively, the data obtained after stratified sampling is shown in table 1 below:
TABLE 1 investigation data sheet
Step S1022: and investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprises a first characteristic value x' and a third label.
Specifically, the hierarchical data already includes the first eigenvalue x', and in this step, the hierarchical data is investigated, specifically, the churn reason of the user is queried in a direct communication manner such as a call made by the customer service to the customer, where the churn reason of the user is the third label.
Step S1023: and screening out a third label of the non-real reasons from the third labels.
In particular, since the third tag may not be accurate, for example, the client applies derivative answers during the telephone investigation, a check is required, and at this time, the data is verified by a person going to the database (in the mass time series data). In one embodiment, the customer says the reason of high rate in the research, but the rate is always below the average rate from the massive time series data, and the reason is judged to be an unrealistic reason. In another embodiment, when the customer says that the pos machine is bad in research, the customer frequently logs in the payment app from the mass transaction data, but never connects the pos machine, and the reason can be judged to be real.
And correspondingly judging whether the reason is the real reason, and modifying the corresponding third label. Taking the rate as an example, if the checking finds that the reason is not true, the third tag is modified from the rate to a non-rate. For example, if 300 of 500 survey data are true causes and 200 are false causes, the 200 tags are modified to be "false" causes.
Step S1024: and determining the first label y' based on an empirical model and the third label of the unreal reason to obtain the investigation data.
Specifically, the label y 'is classified into one of other reason classifications according to an empirical model f (x) on the investigation data modified into the' not 'true reason, so as to obtain a first label y'.
There are currently four categories of experience reasons, where the priority of experience reasons is: rate > transaction failure rate > implement failure > housekeeper service, the provable cause is first investigated based on the empirical model and in conjunction with the priority. The calculation formula of the empirical model is as follows:
where Xn is the nth type of empirical reason, xi is the ith index of Xn, ai is the weight coefficient of index xi, b is the empirical threshold, and the initial value of b is the average value of the sample user in the index. In particular, during practical use, ai and b are constants.
If f (x) > 0, the first tag y 'is equal to 1, otherwise the first tag y' is equal to 0, and negated. In one embodiment, if there are multiple types of empirical reasons, f (x) > 0, the reason with the higher priority is taken as the primary reason.
Step S103: determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'. Specifically, the historical user transaction flow data includes a set of user historical information X = xi, i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; the step of determining the training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y' may be implemented through steps S1031 to S1033 described below.
Specifically, the first feature values x 'and xi may not be the same, e.g., xi is a 200-row feature, x' is not necessarily a subset of xi, and for investigation, some columns of xi may be combined, e.g., monthly rates and adult average rates. For another example, there are 12 columns in xi, each column is the trade amount in the month, and the column corresponding to x 'is the total trade amount in the year, which corresponds to xi being the detail, and x' being the integrated result.
Step S1031: determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis.
In a specific embodiment, determining the correlation between the set X of user history information and the first feature value X' based on a correlation analysis includes: selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed; selecting any two columns from the set X of the user historical information, and respectively calculating a correlation value between the any two columns and each column in the first characteristic value X' until all the columns in the set X of the user historical information are traversed; by analogy, all columns are selected from the set X of the user historical information, and correlation values between all columns in the set X of the user historical information and each column in the first characteristic value X' are respectively calculated; and judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user history information and a certain column in the first characteristic value X'.
Specifically, the correlation analysis method may be a pearson correlation analysis method, but is not limited thereto, and may be other methods capable of performing correlation analysis.
Illustratively, the correlation between the set X of user history information and the first characteristic value X 'is determined by taking the first characteristic value X' as a total transaction amount and taking X as a transaction amount containing 12 months, wherein X is 200 columns as an example.
Iteration 1: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 1 st column in X'; the X2 column is selected from the 200 columns of X, and correlation analysis is performed with the 1 st column in X' until all columns in X, for example, 200 columns, are traversed.
Iteration 2: selecting X1 columns from the 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 2 nd column in X'; select X2 columns from X's 200 columns, perform a correlation analysis with X's 2 nd column until all columns in X, e.g., 200 columns, are traversed.
Iteration 3: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 3 rd column in X'; select X2 columns from X's 200 columns, and perform a correlation analysis with X's 3 rd column until all columns in X, e.g., 200 columns, are traversed.
Iteration round 4: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 4 th column in X'; select X2 columns from X's 200 columns, and perform a correlation analysis with X's 4 th column until all columns in X, e.g., 200 columns, are traversed.
On the basis of the above, 2 columns are selected from 200 columns of X, for example, X1 and X2 columns (2 columns are randomly selected, all columns are traversed), and correlation analysis is performed with the 1 st column in X' until all columns in X, for example, 200 columns, are traversed.
Similarly, 2 columns are selected from 200 columns of X, and correlation analysis is performed with 2 nd, 3 rd and 4 th columns of X', respectively, until all columns in X, e.g., 200 columns, are traversed.
On this basis, 3 columns are selected from 200 columns of X, X1, X2, and X3 columns (3 columns are randomly selected, all columns are traversed), and correlation analysis is performed with each column in X' respectively until all columns in X, for example, 200 columns, are traversed.
And the like, 200 columns are selected from 200 columns of X, and X1, X2, X3, … and X200 columns are respectively subjected to correlation analysis with each column in X'.
And (3) iterating for multiple rounds, obtaining a correlation value according to a correlation algorithm every time correlation analysis is performed, presetting a threshold (for example, 0.001), comparing the correlation value with the set threshold, and if the correlation value is greater than the set threshold, considering that a certain column/columns in X and a certain 1 column in X' have correlation.
Step S1032: and acquiring an initial characteristic matrix X0 corresponding to the first label y'.
In a specific embodiment, obtaining an initial feature matrix X0 corresponding to the first label y' includes: determining a correlation between the first label y 'and the first feature value x' based on a correlation analysis; and removing columns irrelevant to the first label y ' from the first characteristic value X ' based on the correlation between the first label y ' and the first characteristic value X ', so as to obtain an initial characteristic matrix X0 corresponding to the first label y '.
Specifically, first, a correlation between each column in the first label y 'and the first feature value x' is determined using a pearson correlation analysis method. The Pearson correlation analysis method can be found in the previous embodiment, and is not repeated here. Then, columns irrelevant to the first label y ' are removed from the first feature value X ', and an initial feature matrix X0 corresponding to the first label y ' is obtained.
In one embodiment, the columns irrelevant to the first label y 'are removed to obtain a matrix with 500 rows and 80 columns, which is the initial feature matrix X0 corresponding to the first label y'. In addition, since the first tag y 'contains four elements of rate, transaction failure rate, tool failure and housekeeping service, assuming that each element corresponds to 100 rows, there are 5 matrices of 100 rows and 80 columns, that is, each element in the first tag y' corresponds to one initial feature matrix.
Step S1033: and determining the training sample according to the correlation between the set X of the user historical information and the first characteristic value X' and the initial characteristic matrix X0.
In a specific embodiment, determining the training set of the merchant behavior prediction model according to the correlation between the set X of the user history information and the first feature value X' and the initial feature matrix X0 includes: and extracting data corresponding to the initial characteristic matrix X0 from the set X of the user history information according to the correlation between the set X of the user history information and the first characteristic value X' to obtain the training sample.
Specifically, because the initial feature matrix X0 is a plurality of columns in the first feature value X ', a plurality of columns or corresponding columns having a correlation with the initial feature matrix X0 can be extracted from the set X of user history information according to the correlation between the set X of user history information obtained based on the foregoing steps and the first feature value X', so as to obtain a training sample.
Step S1034: and determining a second label y corresponding to the training sample based on the initial feature matrix X0.
In a specific embodiment, determining the second label y corresponding to the training sample based on the initial feature matrix X0 includes: performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix; determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y '; and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.
Specifically, first, the initial feature matrix X0 is subjected to moving average filtering to obtain a common feature matrix, which is several columns of the first feature value X'. And then according to the correlation between the first label Y 'and the first characteristic value x', a plurality of first labels Y 'corresponding to the common characteristic can be determined to form a second label matrix Y, and finally the first label Y' with the most repetition times in the matrix Y is used as the second label Y corresponding to the training sample.
Based on the steps S101 to S103, historical user transaction flow data is firstly collected, research data is secondly obtained from the historical user transaction flow data, and finally a training set of a merchant behavior prediction model is determined based on the historical user transaction flow data, the first characteristic value and the first label. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.
In an embodiment, as shown in fig. 2 specifically, in the method for obtaining the training set of the merchant behavior prediction model, the basic features (training samples) and the business key features (labels of the training samples) are finally obtained, so as to provide technical support for training the merchant behavior prediction model.
It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art can understand that, in order to achieve the effect of the present invention, different steps do not have to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the scope of the present invention.
Furthermore, the invention also provides a device for acquiring the training set of the merchant behavior prediction model.
Referring to fig. 3, fig. 3 is a main structural block diagram of an apparatus for obtaining a training set of a merchant behavior prediction model according to an embodiment of the present invention.
As shown in fig. 3, the apparatus for obtaining a training set of a merchant behavior prediction model in the embodiment of the present invention mainly includes an acquisition module 11, an obtaining module 12, and a determination module 13. In some embodiments, one or more of the acquisition module 11, the acquisition module 12, and the determination module 13 may be combined together into one module.
In some embodiments, the collection module 11 may be configured to collect historical user transaction flow data. The obtaining module 12 may be configured to obtain survey data from the historical user transaction pipeline data, the survey data including a first characteristic value x 'and a first label y', wherein the first characteristic value x 'includes first user basic information and first user historical transaction information, and the first label y' includes a rate, a transaction failure rate, an implement failure, and a housekeeping service. The determination module 13 may be configured to determine the training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'. In one embodiment, the description of the specific implementation function may refer to the description of step S101 to step S103.
For convenience and brevity of description, the contents described in the embodiment of the method for obtaining the training set of the merchant behavior prediction model may be referred to for specific working processes and related descriptions of the apparatus for obtaining the training set of the merchant behavior prediction model, and are not described herein again.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
Furthermore, the invention also provides electronic equipment. In an embodiment of the electronic device according to the present invention, as shown in fig. 4, the electronic device includes a processor 41 and a storage device 42, the storage device may be configured to store a program for executing the method for obtaining the training set of the merchant behavior prediction model of the above method embodiment, and the processor may be configured to execute a program in the storage device, the program including but not limited to a program for executing the method for obtaining the training set of the merchant behavior prediction model of the above method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed.
Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program for executing the method for obtaining a training set of merchant behavior prediction models of the above-described method embodiment, and the program may be loaded and executed by a processor to implement the above-described method for obtaining a training set of merchant behavior prediction models. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer readable storage medium is a non-transitory computer readable storage medium in the embodiment of the present invention.
Further, it should be understood that, since the configuration of each module is only for explaining the functional units of the apparatus of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
Claims (10)
1. A method of obtaining a training set of merchant behavior prediction models, the method comprising:
collecting historical user transaction flow data;
obtaining research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' comprises a rate, a transaction failure rate, an implement failure and a manager service;
determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'.
2. The method for obtaining the training set of the merchant behavior prediction model according to claim 1, wherein the historical user transaction flow data includes a set of user historical information X = xi, i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples;
determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y', including:
determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis;
acquiring an initial characteristic matrix X0 corresponding to the first label y';
determining the training sample according to the correlation between the set X of the user history information and the first characteristic value X' and the initial characteristic matrix X0;
and determining a second label y corresponding to the training sample based on the initial feature matrix X0.
3. The method of deriving a training set of merchant-behavior prediction models as defined in claim 2, wherein determining the correlation between the set X of user-history information and the first eigenvalue X' based on a correlation analysis comprises:
selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed;
selecting any two columns from the set X of the user historical information, and respectively calculating a correlation value between the any two columns and each column in the first characteristic value X' until all the columns in the set X of the user historical information are traversed;
in this way, all columns are selected from the set X of the user history information, and correlation values between all columns in the set X of the user history information and each column in the first characteristic values X' are respectively calculated;
and judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user historical information and a certain column in the first characteristic value X'.
4. The method for obtaining the training set of the merchant behavior prediction model as claimed in claim 2, wherein obtaining the initial feature matrix X0 corresponding to the first label y' comprises:
determining a correlation between the first label y 'and the first feature value x';
based on the correlation between the first label y ' and the first feature value X ', removing the column irrelevant to the first label y ' from the first feature value X ', and obtaining an initial feature matrix X0 corresponding to the first label y '.
5. The method for obtaining the training set of the merchant behavior prediction model according to claim 2, wherein determining the training sample according to the correlation between the set X of the user history information and the first feature value X' and the initial feature matrix X0 includes: and extracting data corresponding to the initial characteristic matrix X0 from the set X of the user history information according to the correlation between the set X of the user history information and the first characteristic value X' to obtain the training sample.
6. The method for obtaining the training set of the merchant behavior prediction model according to claim 4, wherein determining the second label y corresponding to the training sample based on the initial feature matrix X0 includes:
performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix;
determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y ';
and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.
7. The method of deriving a training set of merchant-behavioral prediction models according to claim 1, wherein deriving research data from the historical user-transaction-flow data comprises:
adopting a hierarchical sampling method to carry out hierarchical sampling on the historical user transaction flow data to obtain hierarchical data;
investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprise a first characteristic value x' and a third label;
screening out a third label of the non-true reasons from the third labels;
determining the first label y' based on an empirical model and a third label of the non-genuine cause to obtain the research data.
8. An apparatus for obtaining a training set of merchant behavior prediction models, the apparatus comprising:
a collection module configured to collect historical user transaction flow data;
an obtaining module configured to obtain research data from the historical user transaction flow data, wherein the research data includes a first characteristic value x 'and a first label y', the first characteristic value x 'includes first user basic information and first user historical transaction information, and the first label y' includes a rate, a transaction failure rate, an implement failure and a housekeeping service;
a determination module configured to determine a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'.
9. An electronic device comprising a processor and a storage means adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by the processor to perform the method of obtaining a training set of merchant behavior prediction models as claimed in any one of claims 1 to 7.
10. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the method of obtaining a training set of merchant behavior prediction models as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211473454.1A CN115600112B (en) | 2022-11-23 | 2022-11-23 | Method, device, equipment and medium for obtaining behavior prediction model training set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211473454.1A CN115600112B (en) | 2022-11-23 | 2022-11-23 | Method, device, equipment and medium for obtaining behavior prediction model training set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115600112A true CN115600112A (en) | 2023-01-13 |
CN115600112B CN115600112B (en) | 2023-03-07 |
Family
ID=84853548
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211473454.1A Active CN115600112B (en) | 2022-11-23 | 2022-11-23 | Method, device, equipment and medium for obtaining behavior prediction model training set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115600112B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110757A (en) * | 2019-04-12 | 2019-08-09 | 国电南瑞科技股份有限公司 | A kind of power transmission and transformation suspicious data screening method and equipment based on Random Forest model |
WO2020134533A1 (en) * | 2018-12-29 | 2020-07-02 | 北京市商汤科技开发有限公司 | Method and apparatus for training deep model, electronic device, and storage medium |
WO2020143377A1 (en) * | 2019-01-08 | 2020-07-16 | 阿里巴巴集团控股有限公司 | Industry recognition model determination method and apparatus |
EP3792811A1 (en) * | 2019-09-12 | 2021-03-17 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Method and device for optimizing training set for text classification |
CN115130711A (en) * | 2021-03-26 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer and readable storage medium |
-
2022
- 2022-11-23 CN CN202211473454.1A patent/CN115600112B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020134533A1 (en) * | 2018-12-29 | 2020-07-02 | 北京市商汤科技开发有限公司 | Method and apparatus for training deep model, electronic device, and storage medium |
WO2020143377A1 (en) * | 2019-01-08 | 2020-07-16 | 阿里巴巴集团控股有限公司 | Industry recognition model determination method and apparatus |
CN110110757A (en) * | 2019-04-12 | 2019-08-09 | 国电南瑞科技股份有限公司 | A kind of power transmission and transformation suspicious data screening method and equipment based on Random Forest model |
EP3792811A1 (en) * | 2019-09-12 | 2021-03-17 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Method and device for optimizing training set for text classification |
CN115130711A (en) * | 2021-03-26 | 2022-09-30 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer and readable storage medium |
Non-Patent Citations (3)
Title |
---|
孙权等: "基于数据挖掘的商户风险评分方法和系统", 《软件产业与工程》 * |
张秋菊等: "基于自组织数据挖掘的电子商务客户流失预测模型", 《企业经济》 * |
徐一文;黎潇阳;董启文;钱卫宁;周?;: "基于聚合支付平台交易数据的商户流失预测" * |
Also Published As
Publication number | Publication date |
---|---|
CN115600112B (en) | 2023-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10572885B1 (en) | Training method, apparatus for loan fraud detection model and computer device | |
CN111291816B (en) | Method and device for carrying out feature processing aiming at user classification model | |
CN109389494B (en) | Loan fraud detection model training method, loan fraud detection method and device | |
CN112990386B (en) | User value clustering method and device, computer equipment and storage medium | |
CN107679626A (en) | Machine learning method, device, system, storage medium and equipment | |
CN111340606A (en) | Full-process income auditing method and device | |
CN115409115A (en) | Time sequence clustering abnormal terminal identification method based on user log | |
CN109978575B (en) | Method and device for mining user flow operation scene | |
CN115391561A (en) | Method and device for processing graph network data set, electronic equipment, program and medium | |
CN112819476A (en) | Risk identification method and device, nonvolatile storage medium and processor | |
CN111859057B (en) | Data feature processing method and data feature processing device | |
CN112990989B (en) | Value prediction model input data generation method, device, equipment and medium | |
CN112035775B (en) | User identification method and device based on random forest model and computer equipment | |
CN111311276B (en) | Identification method and device for abnormal user group and readable storage medium | |
CN113448955A (en) | Data set quality evaluation method and device, computer equipment and storage medium | |
CN115600112B (en) | Method, device, equipment and medium for obtaining behavior prediction model training set | |
CN114723554B (en) | Abnormal account identification method and device | |
CN116051185B (en) | Advertisement position data abnormality detection and screening method | |
CN116883070A (en) | Bank generation payroll customer loss early warning method | |
CN115965400A (en) | Customer loss risk prediction method, electronic device and storage medium | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
CN113362120B (en) | Group determination method and device, electronic equipment and computer readable storage medium | |
CN118278970A (en) | Method for constructing user space-time portrait array based on big data algorithm | |
CN117436918A (en) | Method and system for constructing mobile banking client liveness layered model | |
Wang et al. | Does selection bias blind performance diagnostics of business decision models? A case study in salesforce optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |