CN115600112B

CN115600112B - Method, device, equipment and medium for obtaining behavior prediction model training set

Info

Publication number: CN115600112B
Application number: CN202211473454.1A
Authority: CN
Inventors: 吴玉珍
Original assignee: Beijing Jiehui Technology Co Ltd
Current assignee: Beijing Jiehui Technology Co Ltd
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-03-07
Anticipated expiration: 2042-11-23
Also published as: CN115600112A

Abstract

The invention relates to the technical field of data processing, in particular to a method, a device, equipment and a medium for acquiring a behavior prediction model training set, and aims to solve the technical problem that the training precision of a prediction model is low because commercial tenant transaction behaviors in the existing third-party payment industry are used as the training set. To this end, the method for obtaining the training set of the merchant behavior prediction model of the invention comprises the following steps: collecting historical user transaction flow data; acquiring research data from historical user transaction flow data; a training set of merchant behavior prediction models is determined based on historical user transaction flow data, the first feature values, and the first labels. Thus, a training set with high accuracy is obtained.

Description

Method, device, equipment and medium for obtaining behavior prediction model training set

Technical Field

The invention relates to the technical field of data processing, and particularly provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set.

Background

At present, in the existing machine learning model for user behavior analysis, classification prediction is mainly performed for frequent and periodic repetitive behaviors, so that fixed time intervals such as natural days, weeks, months, quarters, years and the like are often used as division granularities to summarize and count the time sequence information of users.

However, in reality, there is a repetitive behavior that shows some periodic behavior over a long period but the period is not completely fixed, such as the behavior of a merchant transaction in the third party payment industry. When the user behavior analysis/prediction model is trained, if the data set is used, the obtained features usually have data misalignment, high null value ratio or coarse-grained statistics, and the problem of insufficient detailed description is caused, so that the accuracy of the trained model is low.

Accordingly, there is a need in the art for a new approach to obtain a training set of merchant behavior prediction models to solve the above-mentioned problems.

Disclosure of Invention

The present invention has been made to overcome the above-mentioned drawbacks, and aims to provide a solution or at least a partial solution to the above-mentioned technical problem. The invention provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set.

In a first aspect, the present invention provides a method of obtaining a training set of merchant behavior prediction models, the method comprising: collecting historical user transaction flow data; obtaining research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' comprises a rate, a transaction failure rate, an implement failure and a housekeeping service; determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'.

In one embodiment, the historical user transaction flow data includes a set of user history information X = xi, i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y', including: determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis; acquiring an initial characteristic matrix X0 corresponding to the first label y'; determining the training sample according to the correlation between the set X of the user history information and the first characteristic value X' and the initial characteristic matrix X0; and determining a second label y corresponding to the training sample based on the initial feature matrix X0.

In one embodiment, determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis includes: selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed; selecting any two columns from the set X of the user historical information, and respectively calculating a correlation value between the any two columns and each column in the first characteristic value X' until all the columns in the set X of the user historical information are traversed; by analogy, all columns are selected from the set X of the user historical information, and correlation values between all columns in the set X of the user historical information and each column in the first characteristic value X' are respectively calculated; and judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user historical information and a certain column in the first characteristic value X'.

In one embodiment, obtaining an initial feature matrix X0 corresponding to the first tag y' includes: determining a correlation between the first label y 'and the first feature value x'; and removing columns irrelevant to the first label y ' from the first characteristic value X ' based on the correlation between the first label y ' and the first characteristic value X ', so as to obtain an initial characteristic matrix X0 corresponding to the first label y '.

In one embodiment, determining the training sample according to the correlation between the set X of user history information and the first feature value X' and the initial feature matrix X0 includes: and extracting data corresponding to the initial characteristic matrix X0 from the set X of the user history information according to the correlation between the set X of the user history information and the first characteristic value X' to obtain the training sample.

In one embodiment, determining the second label y corresponding to the training sample based on the initial feature matrix X0 includes: performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix; determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y '; and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.

In one embodiment, obtaining research data from the historical user transaction flow data comprises: performing hierarchical sampling on the historical user transaction flow data by adopting a hierarchical sampling method to obtain hierarchical data; investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprise a first characteristic value x' and a third label; screening out a third label of the non-true reasons from the third labels; determining the first label y' based on an empirical model and the third label of the non-genuine cause to obtain the research data.

In a second aspect, the present invention provides an apparatus for obtaining a training set of merchant behavior prediction models, the apparatus comprising:

a collection module configured to collect historical user transaction flow data;

an obtaining module configured to obtain research data from the historical user transaction pipeline data, the research data including a first characteristic value x 'and a first tag y', wherein the first characteristic value x 'includes first user basic information and first user historical transaction information, and the first tag y' includes a rate, a transaction failure rate, an implement failure, and a housekeeping service;

a determination module configured to determine a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and a first label y'.

In a third aspect, an electronic device is provided, comprising a processor and a storage adapted to store a plurality of program codes, the program codes being adapted to be loaded and executed by the processor to perform the method of obtaining a training set of merchant behavior prediction models according to any of the preceding claims.

In a fourth aspect, a computer-readable storage medium is provided, in which a plurality of program codes are stored, the program codes being adapted to be loaded and executed by a processor to perform the method for obtaining a training set of merchant behavior prediction models according to any one of the preceding claims.

One or more technical schemes of the invention at least have one or more of the following beneficial effects:

the method for acquiring the training set of the merchant behavior prediction model comprises the steps of firstly acquiring historical user transaction flow data, secondly acquiring research data from the historical user transaction flow data, and finally determining the training set of the merchant behavior prediction model based on the historical user transaction flow data, the first characteristic value x 'and the first label y'. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.

Drawings

The disclosure of the present invention will become more readily understood with reference to the accompanying drawings. As is readily understood by those skilled in the art: these drawings are for illustrative purposes only and are not intended to constitute a limitation on the scope of the present invention. Moreover, in the drawings, like numerals are used to indicate like parts, and in which:

FIG. 1 is a flow diagram illustrating the main steps of a method for obtaining a training set of merchant behavior prediction models according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of a complete flow of a method of obtaining a training set of merchant-behavior prediction models, according to one embodiment of the invention;

fig. 3 is a schematic diagram of a main structural block diagram of an apparatus for obtaining a training set of merchant behavior prediction models according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the electronic device in one embodiment.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, and may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include plural forms as well.

At present, the characteristics of the repeated behaviors which show a certain periodic behavior in a long period but the period is not completely fixed exist in reality, for example, the transaction behaviors of merchants in the third party payment industry have the characteristics. When the user behavior analysis/prediction model is trained, if the data set is used, the obtained features usually have data misalignment, high null value ratio or coarse-grained statistics, and the problem of insufficient detailed description is caused, so that the accuracy of the trained model is low.

Therefore, the application provides a method, a device, equipment and a medium for acquiring a behavior prediction model training set. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.

Referring to fig. 1, fig. 1 is a flow chart illustrating main steps of a method for obtaining a training set of a merchant behavior prediction model according to an embodiment of the present invention.

As shown in fig. 1, the method for obtaining a training set of a merchant behavior prediction model in the embodiment of the present invention mainly includes the following steps S101 to S103.

Step S101: historical user transaction flow data is collected.

The following description is given by taking the example of analyzing the behavior of the merchant in the third party payment industry to predict whether the merchant is at risk of churning and what reason the merchant may be churning.

Specifically, in the third party payment industry, there is one transaction flow data for each transaction, for example, there are ten million user transaction flow data, which can be used as a training set (massive time series data) of a merchant behavior prediction model. The historical user transaction flow data includes a set of user historical information X = xi, i is a positive integer, xi includes second user basic information and second user historical transaction information, and the like, for example, 200 features, which are 200 columns in the data table and are denoted as X = [ X1, X2., X200].

Step S102: and acquiring research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' represents specific reasons of user loss, including rate, transaction failure rate, tool failure and manager service.

In one embodiment, the obtaining of research data from the historical user transaction flow data may be implemented through steps S1021 to S1024 described below.

Step S1021: and performing hierarchical sampling on the historical user transaction flow data by adopting a hierarchical sampling method to obtain hierarchical data.

Specifically, the historical user transaction flow data may be subjected to customer stratification according to the crowd characteristics, where the customer stratification may be implemented by using an unsupervised learning machine learning algorithm, such as a clustering algorithm, on the historical user transaction flow data, specifically, descriptive statistics is performed on customers in the historical user transaction flow data, for example, according to age, region, historical transaction, POS machine registration duration, and the like, the customers in different layers only conform to normal distribution in a group to which the customers should belong, and features that each cluster conforms are obtained after clustering, so that the stratified data is obtained. E.g., tier 7, including, e.g., VIP users (required to meet revenue, transaction amount, etc.); long-term customers (customers who use POS for a long time), etc. For example, each layer relates to 7000 clients.

Then, a hierarchical sampling method is adopted to extract part of users according to the online time interval, the transaction amount interval (accumulated transaction amount, maximum monthly transaction amount) and the like, for example, about 4.9 ten thousand users are finally extracted, which is specifically shown in table 1 below. Wherein the on-grid time interval may be:

the farther time: the time of registration, second brushing and last transaction is 2017.1.1-2018.6.30;

marking time: the time of registration, second brushing and last transaction is 2019.1.1-2020.6.30;

in the near time: the time of registration, two-time brushing, and the last transaction is 2020.7.1-2021.12.31.

Wherein the transaction amount interval may be:

high transaction amount: the monthly transaction amount reaches 7 ten thousand yuan during the online period, and the accumulated transaction amount is more than 24 ten thousand;

amount of the transaction: the monthly transaction amount reaches 4 ten thousand yuan during the online period, and the accumulated transaction amount is more than 24 ten thousand;

low transaction amount: the monthly transaction amount reaches 4 ten thousand yuan during the network period, and the accumulated transaction amount is less than 24 ten thousand.

Illustratively, as shown in the following table, the data obtained after hierarchical sampling is specifically shown in table 1 below:

TABLE 1 investigation data sheet

Step S1022: and investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprises a first characteristic value x' and a third label.

Specifically, the hierarchical data already includes a first feature value x', and in this step, the hierarchical data is investigated, specifically, a user is queried about the churn reason in a direct communication manner such as a way that a customer service calls a telephone call to the client, where the churn reason of the user is the third tag.

Step S1023: and screening out a third label of the non-real reasons from the third labels.

In particular, since the third tag may not be accurate, for example, in the course of a telephone investigation, a check is required, at which time the data is verified by a person going to the database (from among the mass time series data mentioned above). In one embodiment, the customer says the reason why the rate is high during research, but the rate is always below the average rate from the massive time series data, and the reason is judged to be an unreal reason. In another embodiment, when a client says that a pos machine is bad during investigation, it is found from mass transaction data that the client frequently logs in a payment app, but never connects to the pos machine, and then the reason can be judged to be true.

And correspondingly judging whether the reason is the real reason, and modifying the corresponding third label. Taking the rate as an example, if the checking finds that the reason is not true, the third tag is modified from the rate to a non-rate. For example, if 300 of 500 survey data are true causes and 200 are false causes, the 200 tags are modified to be "false" causes.

Step S1024: and determining the first label y' based on an empirical model and the third label of the unreal reason to obtain the investigation data.

Specifically, the label y 'is classified into one of other reason classifications according to an empirical model f (x) on the investigation data modified into the' not 'true reason, so as to obtain a first label y'.

There are currently four categories of experience reasons, where the priority of experience reasons is: rate > transaction failure rate > implement failure > housekeeper service, the provable cause is first investigated based on the empirical model and in conjunction with the priority. The calculation formula of the empirical model is as follows:

where Xn is the nth type of empirical reason, xi is the ith index of Xn, ai is the weight coefficient of index xi, b is the empirical threshold, and the initial value of b is the average value of the sample user in the index. Particularly in practical applications, ai and b are constants.

If f (x) > 0, the first tag y 'is equal to 1, otherwise the first tag y' is equal to 0, and negated. In one embodiment, if there are multiple types of empirical reasons, f (x) > 0, the reason with the higher priority is taken as the primary reason.

Step S103: determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'. Specifically, the historical user transaction flow data includes a set of user historical information X = xi, i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; the step of determining the training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y' may be implemented through steps S1031 to S1033 described below.

Specifically, the first feature values x 'and xi may not be the same, e.g., xi is a 200-row feature, x' is not necessarily a subset of xi, and for research purposes, some columns of xi may be combined, e.g., monthly rates and adult average rates. For another example, there are 12 columns in xi, each column is the trade amount in the month, and the column corresponding to x 'is the total trade amount in the year, which corresponds to xi being the detail, and x' being the integrated result.

Step S1031: determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis.

In a specific embodiment, determining the correlation between the set X of user history information and the first feature value X' based on a correlation analysis includes: selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed; selecting any two columns from the set X of the user history information, and respectively calculating a correlation value between the any two columns and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed; in this way, all columns are selected from the set X of the user history information, and correlation values between all columns in the set X of the user history information and each column in the first characteristic values X' are respectively calculated; and judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user historical information and a certain column in the first characteristic value X'.

Specifically, the correlation analysis method may be a pearson correlation analysis method, but is not limited thereto, and may be other methods capable of performing correlation analysis.

Illustratively, the correlation between the set X of user history information and the first characteristic value X 'is determined by taking the first characteristic value X' as a total transaction amount and taking X as a transaction amount containing 12 months, wherein X is 200 columns as an example.

Iteration 1: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 1 st column in X'; select X2 columns from X's 200 columns, and perform a correlation analysis with X's 1 st column until all columns in X, e.g., 200 columns, are traversed.

Iteration 2: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 2 nd column in X'; select X2 columns from X's 200 columns, perform a correlation analysis with X's 2 nd column until all columns in X, e.g., 200 columns, are traversed.

Iteration 3: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 3 rd column in X'; the X2 column is selected from the 200 columns of X, and correlation analysis is performed with the 3 rd column in X' until all columns in X, for example, 200 columns, are traversed.

Iteration 4: selecting X1 columns from 200 columns of X (randomly selecting 1 column, traversing all columns), and carrying out correlation analysis with the 4 th column in X'; select X2 columns from X's 200 columns, and perform a correlation analysis with X's 4 th column until all columns in X, e.g., 200 columns, are traversed.

On the basis of the above, 2 columns are selected from 200 columns of X, for example, X1 and X2 columns (2 columns are randomly selected, all columns are traversed), and correlation analysis is performed with the 1 st column in X' until all columns in X, for example, 200 columns, are traversed.

Similarly, 2 columns are selected from 200 columns of X, and correlation analysis is performed with 2 nd, 3 rd and 4 th columns of X', respectively, until all columns in X, e.g., 200 columns, are traversed.

On this basis, 3 columns are selected from 200 columns of X, X1, X2, and X3 columns (3 columns are randomly selected, all columns are traversed), and correlation analysis is performed with each column in X' respectively until all columns in X, for example, 200 columns, are traversed.

And by analogy, 200 columns are selected from 200 columns of X, and X1, X2, X3, … and X200 columns are respectively subjected to correlation analysis with each column in X'.

And (3) iterating for multiple rounds, obtaining a correlation value according to a correlation algorithm every time correlation analysis is performed, presetting a threshold (for example, 0.001), comparing the correlation value with the set threshold, and if the correlation value is greater than the set threshold, considering that a certain column/columns in X and a certain 1 column in X' have correlation.

Step S1032: and acquiring an initial characteristic matrix X0 corresponding to the first label y'.

In a specific embodiment, obtaining an initial feature matrix X0 corresponding to the first label y' includes: determining a correlation between the first label y 'and the first feature value x' based on a correlation analysis; and removing columns irrelevant to the first label y ' from the first characteristic value X ' based on the correlation between the first label y ' and the first characteristic value X ', so as to obtain an initial characteristic matrix X0 corresponding to the first label y '.

Specifically, a pearson correlation analysis method is first used to determine the correlation between each column in the first label y 'and the first feature value x'. The Pearson correlation analysis method can be found in the previous embodiment, and is not repeated here. Then, columns irrelevant to the first label y ' are removed from the first feature value X ', and an initial feature matrix X0 corresponding to the first label y ' is obtained.

In one embodiment, the columns irrelevant to the first label y 'are removed to obtain a matrix with 500 rows and 80 columns, which is the initial feature matrix X0 corresponding to the first label y'. In addition, since the first tag y 'contains four elements of rate, transaction failure rate, tool failure and housekeeping service, assuming that each element corresponds to 100 rows, there are 5 matrices of 100 rows and 80 columns, that is, each element in the first tag y' corresponds to one initial feature matrix.

Step S1033: and determining the training sample according to the correlation between the set X of the user historical information and the first characteristic value X' and the initial characteristic matrix X0.

In a specific embodiment, determining a training set of the merchant behavior prediction model according to the correlation between the set X of the user history information and the first feature value X' and the initial feature matrix X0 includes: and extracting data corresponding to the initial feature matrix X0 from the set X of the user history information according to the correlation between the set X of the user history information and the first feature value X' to obtain the training sample.

Specifically, because the initial feature matrix X0 is a plurality of columns in the first feature value X ', a plurality of columns or corresponding columns having a correlation with the initial feature matrix X0 can be extracted from the set X of user history information according to the correlation between the set X of user history information obtained based on the foregoing steps and the first feature value X', so as to obtain a training sample.

Step S1034: and determining a second label y corresponding to the training sample based on the initial feature matrix X0.

In a specific embodiment, determining the second label y corresponding to the training sample based on the initial feature matrix X0 includes: performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix; determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y '; and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.

Specifically, first, a moving average filtering is performed on the initial feature matrix X0 to obtain a common feature matrix, where the common feature matrix is a plurality of columns of the first feature value X'. And then according to the correlation between the first label Y ' and the first characteristic value x ', a plurality of first labels Y ' corresponding to the common characteristic can be determined, the plurality of first labels Y ' form a second label matrix Y, and finally the first label Y ' with the most repetition times in Y is used as the second label Y corresponding to the training sample.

Based on the steps S101 to S103, historical user transaction flow data is firstly collected, research data is secondly obtained from the historical user transaction flow data, and finally a training set of a merchant behavior prediction model is determined based on the historical user transaction flow data, the first characteristic value and the first label. Therefore, a training set with high accuracy can be obtained, and the training precision of the merchant behavior prediction model is improved.

In an embodiment, as shown in fig. 2 specifically, in the method for obtaining the training set of the merchant behavior prediction model, the basic features (training samples) and the business key features (labels of the training samples) are finally obtained, so as to provide technical support for training the merchant behavior prediction model.

It should be noted that, although the foregoing embodiments describe each step in a specific sequence, those skilled in the art will understand that, in order to achieve the effect of the present invention, different steps do not necessarily need to be executed in such a sequence, and they may be executed simultaneously (in parallel) or in other sequences, and these changes are all within the protection scope of the present invention.

Furthermore, the invention also provides a device for acquiring the training set of the merchant behavior prediction model.

Referring to fig. 3, fig. 3 is a main structural block diagram of an apparatus for obtaining a training set of a merchant behavior prediction model according to an embodiment of the present invention.

As shown in fig. 3, the apparatus for obtaining a training set of a merchant behavior prediction model in the embodiment of the present invention mainly includes an acquisition module 11, an obtaining module 12, and a determination module 13. In some embodiments, one or more of the acquisition module 11, the acquisition module 12, and the determination module 13 may be combined together into one module.

In some embodiments, the collection module 11 may be configured to collect historical user transaction flow data. The obtaining module 12 may be configured to obtain survey data from the historical user transaction flow data, the survey data including a first characteristic value x 'and a first label y', wherein the first characteristic value x 'includes first user basic information and first user historical transaction information, and the first label y' includes a rate, a transaction failure rate, an implement failure, and a housekeeping service. The determination module 13 may be configured to determine a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y'. In one embodiment, the description of the specific implementation function may refer to the description of step S101 to step S103.

For convenience and brevity of description, the contents described in the embodiment of the method for obtaining the training set of the merchant behavior prediction model may be referred to for specific working processes and related descriptions of the apparatus for obtaining the training set of the merchant behavior prediction model, and are not described herein again.

It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

Furthermore, the invention also provides electronic equipment. In an embodiment of the electronic device according to the present invention, as shown in fig. 4, the electronic device includes a processor 41 and a storage device 42, the storage device may be configured to store a program for executing the method for obtaining the training set of the merchant behavior prediction model of the above method embodiment, and the processor may be configured to execute a program in the storage device, the program including but not limited to a program for executing the method for obtaining the training set of the merchant behavior prediction model of the above method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed.

Further, the invention also provides a computer readable storage medium. In one computer-readable storage medium embodiment according to the present invention, a computer-readable storage medium may be configured to store a program for executing the method for obtaining a training set of merchant behavior prediction models of the above-described method embodiment, and the program may be loaded and executed by a processor to implement the above-described method for obtaining a training set of merchant behavior prediction models. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The computer-readable storage medium may be a storage device formed by including various electronic devices, and optionally, the computer-readable storage medium is a non-transitory computer-readable storage medium in an embodiment of the present invention.

Further, it should be understood that, since the modules are only configured to illustrate the functional units of the apparatus of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual blocks in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the apparatus may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A method of obtaining a training set of merchant behavior prediction models, the method comprising:

collecting historical user transaction flow data;

obtaining research data from the historical user transaction flow data, wherein the research data comprises a first characteristic value x 'and a first label y', the first characteristic value x 'comprises first user basic information and first user historical transaction information, and the first label y' comprises a rate, a transaction failure rate, an implement failure and a manager service;

determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and a first label y';

the historical user transaction flow data comprise a set X = xi of user historical information, and i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; determining a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and the first label y', including:

determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis;

acquiring an initial characteristic matrix X0 corresponding to the first label y';

determining the training sample according to the correlation between the set X of the user historical information and the first characteristic value X' and the initial characteristic matrix X0;

determining a second label y corresponding to the training sample based on the initial feature matrix X0;

determining a correlation between the set X of user history information and the first feature value X' based on a correlation analysis, comprising:

selecting any column from the set X of the user history information, and respectively calculating a correlation value between the any column and each column in the first characteristic value X' until all the columns in the set X of the user history information are traversed;

selecting any two columns from the set X of the user historical information, and respectively calculating a correlation value between the any two columns and each column in the first characteristic value X' until all the columns in the set X of the user historical information are traversed;

by analogy, all columns are selected from the set X of the user historical information, and correlation values between all columns in the set X of the user historical information and each column in the first characteristic value X' are respectively calculated;

judging whether each correlation value is larger than a preset threshold value, if so, determining that the correlation exists between a corresponding column in the set X of the user historical information and a certain column in the first characteristic value X';

acquiring an initial feature matrix X0 corresponding to the first label y', wherein the acquisition comprises the following steps:

determining a correlation between the first label y 'and the first feature value x';

based on the correlation between the first label y ' and the first characteristic value X ', removing the column irrelevant to the first label y ' from the first characteristic value X ' to obtain an initial characteristic matrix X0 corresponding to the first label y ';

determining the training sample according to the correlation between the set X of the user history information and the first feature value X' and the initial feature matrix X0, including: extracting data corresponding to the initial feature matrix X0 from the set X of the user historical information according to the correlation between the set X of the user historical information and the first feature value X' to obtain the training sample;

determining a second label y corresponding to the training sample based on the initial feature matrix X0, including:

performing moving average filtering on the initial feature matrix X0 to obtain a public feature matrix;

determining a second label matrix Y according to the public characteristic matrix and the correlation between the first label Y ' and the first characteristic value x ', wherein the second label matrix comprises a plurality of the first labels Y ';

and taking the first label Y' with the most repeated times in the second labels Y as the second label Y corresponding to the training sample.

2. The method of deriving a training set of merchant-behavioral prediction models according to claim 1, wherein deriving research data from the historical user-transaction-flow data comprises:

performing hierarchical sampling on the historical user transaction flow data by adopting a hierarchical sampling method to obtain hierarchical data;

investigating the layered data to obtain initial investigation data, wherein the initial investigation data comprise a first characteristic value x' and a third label;

screening out a third label of the non-true reasons from the third labels;

determining the first label y' based on an empirical model and the third label of the non-genuine cause to obtain the research data.

3. An apparatus for obtaining a training set of merchant behavior prediction models, the apparatus comprising:

a determination module configured to determine a training set of the merchant behavior prediction model based on the historical user transaction flow data, the first feature value x 'and a first label y';

the historical user transaction flow data comprises a set X = xi of user historical information, and i is a positive integer; the training set of the merchant behavior prediction model comprises training samples and second labels y corresponding to the training samples; the determination module is further configured to:

acquiring an initial feature matrix X0 corresponding to the first label y', including:

determining a second label matrix Y according to the common feature matrix and the correlation between the first label Y ' and the first feature value x ', wherein the second label matrix comprises a plurality of the first labels Y ';

and taking the first label Y' with the most repetition times in the second labels Y as the second label Y corresponding to the training sample.

4. An electronic device comprising a processor and a storage means adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by the processor to perform the method of obtaining a training set of merchant behavior prediction models as claimed in any one of claims 1 to 2.

5. A computer readable storage medium having stored therein a plurality of program codes, characterized in that the program codes are adapted to be loaded and executed by a processor to perform the method of obtaining a training set of merchant behavior prediction models as claimed in any one of claims 1 to 2.