CN111583011A

CN111583011A - Data processing method, device, equipment and storage medium

Info

Publication number: CN111583011A
Application number: CN201910124234.XA
Authority: CN
Inventors: 董健; 常富洋; 颜水成
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-02-18
Filing date: 2019-02-18
Publication date: 2020-08-25

Abstract

The embodiment of the specification provides a data processing method, a data processing device, data processing equipment and a storage medium. The method comprises the following steps: training by using first user characteristic sample data to obtain an event overdue probability model; acquiring a plurality of second user characteristic sample data, and acquiring an event overdue probability corresponding to each second user characteristic sample data by using the event overdue probability model; obtaining second user characteristic sample data with a plurality of event overdue probabilities larger than a set overdue probability threshold value as third user characteristic sample data, and grouping the third user characteristic sample data; respectively acquiring a pre-estimated reward value and a pre-estimated uncertainty value of each group of third user characteristic sample data by using an enhanced learning model; selecting the event overdue probability of the third user characteristic sample data with the maximum sum of the predicted reward value and the predicted uncertainty; adjusting the event overdue probability threshold with the selected event overdue probability. The embodiment of the invention can accurately adjust the overdue probability threshold.

Description

Data processing method, device, equipment and storage medium

Technical Field

The embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a data processing method, an apparatus, a device, and a storage medium.

Background

In recent years, internet finance has been vigorously developed. One of the financial cores of the internet is the loan transaction, the most important being the determination of whether to loan a user. At present, a mainstream payment overdue model is usually obtained by performing supervised learning training by using user characteristic data of users as samples based on users with existing payment performance. The model training mode and the obtained payment overdue model are difficult to accurately evaluate users refused to pay or having no payment performance, so that samples optimally used by subsequent models are positive samples, the models are over-converged, and the accuracy of the models is reduced.

Disclosure of Invention

The embodiment of the specification provides a data processing method, a data processing device, data processing equipment and a storage medium, and through intelligent adjustment of an event overdue probability threshold, user downward exploration can be effectively conducted, model optimization samples are enriched, and model accuracy is improved.

In a first aspect, an embodiment of the present specification provides a data processing method, including:

acquiring a plurality of first user characteristic sample data, wherein the first user characteristic sample data are labeled sample data, and training by using the first user characteristic sample data to obtain an event overdue probability model;

acquiring a plurality of second user characteristic sample data, wherein the second user characteristic sample data is label-free sample data, and acquiring event overdue probability corresponding to each second user characteristic sample data by using an event overdue probability model;

acquiring second user characteristic sample data of which the overdue probabilities of a plurality of events are greater than a set overdue probability threshold value as third user characteristic sample data;

respectively acquiring the estimated reward value and the estimated uncertainty value of each third user characteristic sample data by using an enhanced learning model;

selecting the event overdue probability of the third user characteristic sample data with the maximum sum of the predicted reward value and the predicted uncertainty;

adjusting the event overdue probability threshold with the selected event overdue probability.

With reference to the first aspect, in a first implementation manner of the first aspect of the embodiments of the present invention, the augmented learning model includes a linear model and a contextual gambling machine, and the obtaining, by using the augmented learning model, the estimated reward value and the estimated uncertainty value of each third user feature sample data respectively includes:

respectively acquiring the pre-estimated reward corresponding to each third user characteristic sample data by using the linear model;

and respectively acquiring the estimated uncertainty value corresponding to the feature sample data of each third user by using the context gambling machine.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect of the embodiment of the present invention, the obtaining, by using the linear model, the predicted revenue corresponding to each third user feature sample data includes:

acquiring event characteristic data of each third user characteristic sample data;

and respectively acquiring the pre-estimated rewards corresponding to the feature sample data of each third user by using the linear model by taking the respective event feature data and the respective event overdue probability of the feature sample data of each third user as input values.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect of the embodiment of the present invention, the event characteristic data includes at least one of:

the data of the area where the target object is located, the income data of the target object and the academic data of the target object.

With reference to the first implementation manner of the first aspect, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the respectively obtaining, by using the contextual gambling machine, estimated uncertainty values corresponding to each third user feature data includes:

acquiring event state data corresponding to each third user characteristic data;

and respectively acquiring the estimated uncertainty value corresponding to each adjustment coefficient interval by using the context gambling machine by taking the respective credit granting state data and the respective event overdue probability of the third user characteristic data as input values.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, or the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect of an embodiment of the present invention, the first user feature sample data and/or the second user feature sample data are obtained by using a user portrait model, and the user portrait model is obtained by training using at least one of the following feature samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, or the fourth implementation manner of the first aspect, in a sixth implementation manner of the first aspect of the embodiment of the present invention, the method further includes:

receiving an event evaluation request message sent by a target user terminal, wherein the event evaluation request message carries identification information of a target user;

searching user characteristic data of the target user by using the identification information;

the user characteristic data is used as an input value of the event overdue probability model, and the event overdue probability of the target user is obtained by using the event overdue probability model;

comparing the event overdue probability with the adjusted event overdue probability threshold to obtain a comparison result;

and sending an event evaluation response message to the target user terminal, wherein the event evaluation response message carries information representing the comparison result.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:

the model training module is used for acquiring a plurality of first user characteristic sample data, wherein the first user characteristic sample data are labeled sample data, and training by utilizing the first user characteristic sample data to obtain an event overdue probability model;

the probability pre-estimation module is used for acquiring a plurality of second user characteristic sample data, wherein the second user characteristic sample data are label-free sample data, and acquiring the event overdue probability corresponding to each second user characteristic sample data by using the event overdue probability model;

the sample acquisition module is used for acquiring second user characteristic sample data of which the overdue probabilities of a plurality of events are greater than a set overdue probability threshold value as third user characteristic sample data;

the reinforcement learning module is used for respectively acquiring the estimated reward value and the estimated uncertainty value of each third user characteristic sample data by utilizing a reinforcement learning model;

the probability selection module is used for selecting the event overdue probability of the third user characteristic sample data with the maximum sum of the pre-estimated reward value and the pre-estimated uncertainty;

and the probability adjusting module is used for adjusting the event overdue probability threshold by using the selected event overdue probability.

In combination with the second aspect, the reinforcement learning model includes a linear model and a contextual gambling machine, the reinforcement learning module including:

the linear model module is used for respectively acquiring the pre-estimated reward corresponding to the feature sample data of each third user by utilizing the linear model;

and the context gambling machine module is used for respectively acquiring the estimated uncertainty value corresponding to the feature sample data of each third user by using the context gambling machine.

With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect of the embodiment of the present invention, the linear model module is configured to:

With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect of the embodiment of the present invention, the event characteristic data includes at least one of:

With reference to the first implementation manner of the second aspect, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the contextual gambling machine module is configured to:

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, or the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect of the present invention, the first user feature sample data and/or the second user feature sample data are obtained by using a user representation model, and the user representation model is trained by using at least one of the following feature samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, or the fourth implementation manner of the second aspect, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the apparatus further includes an event prediction module, configured to:

In a third aspect, an embodiment of the present invention further provides a computer device, including a processor and a memory:

the memory is used for storing a program for executing the method according to each implementation of the first aspect,

the processor is configured to execute programs stored in the memory.

In a fourth aspect, an embodiment of the present invention further provides a computer storage medium for storing computer software instructions for the computer device according to the third aspect.

The embodiment of the specification has the following beneficial effects:

in the embodiment of the invention, an event overdue probability model is trained, the event overdue probabilities of a plurality of second user characteristic sample data are obtained by utilizing the model, the user characteristic sample data exceeding an event overdue probability threshold value are selected for reinforcement learning, and the event overdue probability with the maximum sum of the estimated reward value and the estimated uncertain value obtained by the reinforcement learning is selected for adjusting the event overdue probability threshold value. The method has the advantages that the intelligent adjustment of the event overdue probability is realized, the convergence speed of the adjustment mode is high, samples of the overdue probability model can be effectively supplemented, the prediction accuracy of the overdue probability model is improved, and accurate assessment can be provided for users without deposit performance and users refused to deposit.

Drawings

Fig. 1 is a schematic view of a scenario in which the method of the first aspect of the embodiment of the present invention is applied;

FIG. 2 is a flow chart of a method according to an embodiment of the first aspect of the present invention;

FIG. 3 is a method flow diagram of another embodiment of the first aspect of the present invention;

fig. 4 is a schematic structural diagram of an apparatus according to a second aspect of the embodiment of the present invention.

Detailed Description

In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.

The embodiments of the present description may be implemented on the trust system shown in fig. 1. In fig. 1, a client application of a credit granting system is installed on a user terminal 101, and after a user calls the client application, the user terminal 101 communicates with a server 102 and completes a corresponding task. For example, to implement the method provided by the embodiment of the present invention, the client application sends an event evaluation request message to the server 102 through the user terminal 101, after receiving the message, the server 102 searches for user feature data of a target user according to identification information of the target user carried in the message, uses the user feature data as an input value of the event overdue probability model, obtains the event overdue probability of the target user by using the event overdue probability model, compares the event overdue probability with the event overdue probability threshold to obtain a comparison result, and sends an event evaluation response message to the user terminal 101, where the event evaluation response message carries information representing the comparison result.

The event overdue probability threshold is dynamically adjusted, and the adjustment method will be described in detail in the following embodiments.

In a first aspect, an embodiment of the present disclosure provides a data processing method, please refer to fig. 2, including:

step 201, obtaining a plurality of first user characteristic sample data, wherein the first user characteristic sample data are labeled sample data, and training by using the first user characteristic sample data to obtain an event overdue probability model.

In the embodiment of the present invention, the first user characteristic sample data may be positive sample data (taking whether to allow loan in the credit granting event as an example, and a positive sample refers to a sample that allows loan), or may be negative sample data (taking whether to allow loan in the credit granting event as an example, and a negative sample refers to a sample that refuses loan).

In this embodiment of the present invention, the first user characteristic sample data includes a plurality of user characteristics, which may be, but is not limited to, a vector composed of a plurality of user characteristics.

Step 202, obtaining a plurality of second user characteristic sample data, wherein the second user characteristic sample data are non-label sample data, and obtaining an event overdue probability corresponding to each second user characteristic sample data by using an event overdue probability model.

In this embodiment of the present invention, the second user characteristic sample data includes a plurality of user characteristics, which may be, but is not limited to, a vector composed of a plurality of user characteristics.

In this embodiment of the present invention, the second user feature sample data may be feature data of a user who is rejected to deposit money, or may be feature data of a user who does not exhibit deposit money. The above "no label" does not mean that the samples must not have a label, but that no label is required for subsequent processing.

Step 203, obtaining a plurality of second user characteristic sample data with the event overdue probability larger than the set overdue probability threshold as third user characteristic sample data.

In the embodiment of the invention, the overdue probability threshold is preset, can be set manually, and can also be set in a simulation mode, a fitting mode and the like. In the embodiment of the present invention, the expected probability threshold is adjustable, and the method provided in the embodiment of the present invention is specifically adopted for adjustment.

The event overdue probability is greater than the expected probability threshold, meaning that the corresponding user should be denied the deposit. However, the method provided by the embodiment of the invention utilizes the reinforcement learning algorithm to continuously learn the samples, and performs user downward exploration, thereby adjusting the threshold value.

In the embodiment of the present invention, all the second user characteristic sample data that meets the above conditions may be selected as the third user characteristic sample data. However, in order to improve the accuracy and convergence rate of the algorithm, some sample data may be selected. The selection mode is not limited in the embodiment of the present invention, for example, a predetermined number of second user feature sample data may be randomly selected as third user feature sample data, or second user feature sample data whose certain feature (e.g., revenue) satisfies a single feature condition may be selected as third user feature sample data.

And 204, respectively acquiring the estimated reward value and the estimated uncertainty value of each third user characteristic sample data by using the reinforcement learning model.

The reward is a proper noun in the reinforcement learning algorithm, and in the embodiment of the invention, the reward value can be, but is not limited to, representing the benefit of paying.

In the embodiment of the invention, the utilization rate of the income credit line is in direct proportion and is in inverse proportion to the bad account amount.

In the embodiment of the invention, the uncertainty value represents the size of the bad account possibility and/or the size of the movable branch possibility, the smaller the uncertainty value is, the smaller the bad account possibility is represented and/or the movable branch possibility is greater, and the larger the uncertainty value is, the larger the bad account possibility is represented and/or the movable branch possibility is represented.

And step 205, selecting the event overdue probability of the third user characteristic sample data with the maximum sum of the estimated reward value and the estimated uncertainty.

The maximum sum of the predicted reward value and the predicted uncertainty means that a payout is allowed for the user at the probability that the event is overdue.

And step 206, adjusting the event overdue probability threshold by using the selected event overdue probability.

In the embodiment of the present invention, the implementation manner of step 206 is various, for example, the event overdue probability threshold is replaced by the selected event overdue probability, and for example, a mean value of the selected event overdue probability and the event overdue probability threshold before adjustment is used as the event overdue probability threshold after adjustment, and the like.

In the method provided by the embodiment of the present invention, there are various implementation manners of the step 204, that is, the step 204 can be implemented by using various reinforcement learning models. Preferably, the reinforcement learning model includes a linear model and a context gambling machine, and accordingly, the step 204 is implemented as follows: respectively acquiring a pre-estimated reward value corresponding to each third user characteristic sample data by using a linear model; and respectively acquiring the estimated uncertainty value corresponding to the feature sample data of each third user by using the context gambling machine.

The embodiment of the invention does not limit the linear model and the concrete model structure and the training method of the context gambling machine.

Any linear model obtained by training in a mode of reinforcement learning linear fitting by using sample data (including credit granting characteristic data and the like) corresponding to the completed credit granting evaluation (namely, whether to issue a loan) can be used in the method provided by the embodiment of the invention. Wherein, the more the sample data, the more accurate the training result.

Any contextual gambling machine trained using sample data (including credit status data) corresponding to a completed credit assessment (i.e., whether to loan) may be used in the methods provided by embodiments of the present invention. Wherein the more sample data, the less uncertainty.

In the embodiment of the present invention, the implementation manner of respectively obtaining the pre-estimated rewards corresponding to each third user feature sample data by using the linear model may be: acquiring event characteristic data of each third user characteristic sample data; and respectively acquiring the pre-estimated rewards corresponding to the feature sample data of each third user by using the linear model by taking the respective event feature data and the respective event overdue probability of the feature sample data of each third user as input values.

The implementation manner of obtaining the event feature data of the third user feature data may be: and searching the event characteristic data from the local database, if the event characteristic data is searched, acquiring the searched event characteristic data, and if the event characteristic data is not searched, searching and acquiring the event characteristic data through a third party database (such as a people bank database), and storing the acquired event characteristic data into the local database.

In an embodiment of the present invention, the pre-estimated uncertainty values corresponding to the feature sample data of each third user are respectively obtained by using the context gambling machine, and the implementation manner of the pre-estimated uncertainty values may be as follows: acquiring event state data corresponding to each third user characteristic data; and respectively acquiring the estimated uncertainty value corresponding to each adjustment coefficient interval by using the context gambling machine by taking the respective credit granting state data and the respective event overdue probability of the third user characteristic data as input values.

In an embodiment of the present invention, the event feature data includes at least one of the following: the data of the area where the user is located, the data of the income of the user and the data of the academic calendar of the user.

The data of the area where the user is located can be coded data obtained by coding the area where the user is located, and the area where the user is located can be but is not limited to a city where the user is located;

the user revenue data may be, but is not limited to, a total revenue value for the user over a predetermined time period;

the user study data may be encoded data obtained by encoding the user study.

In the embodiment of the invention, the event state data is data capable of reflecting the event state of the user, and the selection of the data is not limited by the invention.

In any of the above method embodiments, the user feature sample data may be, but is not limited to, obtained through a user representation model. The user representation model may be trained using, but is not limited to, at least one of the following user features: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

On the basis of any of the above method embodiments, the deposit request of the user may be evaluated by using the adjusted threshold. Specifically, as shown in fig. 3, the method includes the following operations:

step 301, receiving an event evaluation request message sent by a target user terminal, wherein the event evaluation request message carries identification information of a target user;

step 302, searching user characteristic data of the target user by using the identification information;

step 303, taking the user characteristic data as an input value of the event overdue probability model, and acquiring the event overdue probability of the target user by using the event overdue probability model;

step 304, comparing the event overdue probability with the adjusted event overdue probability threshold to obtain a comparison result;

step 305, sending an event evaluation response message to the target user terminal, where the event evaluation response message carries information indicating the comparison result.

In the embodiment of the present invention, the identification information of the user is identification information of the user, such as an identification number, a passport number, or a combination of a name and a telephone number.

The user opens the credit granting system client application program on the user terminal, and if the performed operation needs credit granting evaluation, the client application program sends a credit granting evaluation request message (i.e. an event evaluation request message) to the server through the communication module of the user terminal.

Optionally, the credit assessment request message may also carry data of a region where the user is located, user income data, user academic data, and the like.

In the embodiment of the invention, the server can be an independent server or a cloud server. If the server is an independent server, the local database may be a database disposed on a disk storage space of the independent server, or a database disposed on a database server allocated to the independent server. If the cloud server is used, the local database can be a database arranged on any node on the cloud server.

For the loan problem, for the non-paying user, it is impossible to obtain the label of overdue, so that the supervised learning cannot be directly performed. To this end, the embodiment of the present invention provides a framework based on reinforcement learning. Firstly, modeling is carried out on a user according to various information such as pedestrian data, consumption, call records, geographic positions and app using conditions of the user, and the overdue probability of the user is deduced on the basis. For the rejected users, the predicted overdue probability is partitioned, and the overdue rate and uncertainty of different intervals are predicted respectively. The reward is the income in the interval and is a linear model of multiple characteristics of the city, income, academic calendar and the like of the user. The uncertainty is predicted by the context gambler algorithm. Therefore, a method for guiding the user to go down is obtained theoretically, payment is paid according to the reward, the uncertainty and the maximum interval, the condition of rejecting the user can be systematically explored, and therefore the model is optimized, and the change of the crowd is adapted more quickly under the condition that the risk is controllable.

In a second aspect, an embodiment of the present invention discloses a data processing apparatus, please refer to fig. 4, including:

the model training module 401 is configured to acquire a plurality of first user feature sample data, where the first user feature sample data are labeled sample data, and train the sample data by using the first user feature sample data to obtain an event overdue probability model;

a probability pre-estimating module 402, configured to obtain multiple second user characteristic sample data, where the second user characteristic sample data is non-tag sample data, and obtain an event overdue probability corresponding to each second user characteristic sample data by using the event overdue probability model;

a sample obtaining module 403, configured to obtain second user feature sample data with a plurality of event overdue probabilities greater than a set overdue probability threshold as third user feature sample data;

the reinforcement learning module 404 is configured to obtain a prediction reward value and a prediction uncertainty value of each third user feature sample data by using a reinforcement learning model;

a probability selection module 405, configured to select an event overdue probability of the third user feature sample data with the largest sum of the predicted reward value and the predicted uncertainty;

a probability adjustment module 406 configured to adjust the event overdue probability threshold using the selected event overdue probability.

Optionally, the reinforcement learning model includes a linear model and a contextual gambling machine, and the reinforcement learning module includes:

Optionally, the linear model module is configured to:

Optionally, the contextual gambling machine module is configured to:

Optionally, the first user feature sample data and/or the second user feature sample data are obtained by using a user portrait model, and the user portrait model is obtained by training using at least one of the following feature samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

Optionally, the apparatus further includes an event prediction module, configured to:

the processor is configured to execute programs stored in the memory.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.

The embodiment of the invention discloses:

a1, a data processing method, comprising:

obtaining a plurality of first user characteristic sample data, wherein the first user characteristic sample data are labeled sample data, and training by using the first user characteristic sample data to obtain an event overdue probability model;

acquiring a plurality of second user characteristic sample data, wherein the second user characteristic sample data is label-free sample data, and acquiring an event overdue probability corresponding to each second user characteristic sample data by using the event overdue probability model;

A2, the method according to a1, wherein the reinforcement learning model includes a linear model and a context gambling machine, and the obtaining of the predicted reward value and the predicted uncertainty value of each third user feature sample data by using the reinforcement learning model respectively includes:

A3, according to the method of A2, the respectively obtaining the pre-estimated income corresponding to each third user feature sample data by using the linear model includes:

A4, the method of A3, the event profile data comprising at least one of:

A5, the method according to A2, wherein the obtaining the estimated uncertainty value corresponding to each third user feature data by the context gambling machine comprises:

A6, the method of any one of A1 to A5, wherein the first user characteristic sample data and/or the second user characteristic sample data are obtained by using a user profile model trained using at least one of the following characteristic samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

A7, the method of any one of A1 to A5, further comprising:

B8, a data processing apparatus comprising:

B9, the apparatus of B8, the reinforcement learning model comprising a linear model and a contextual gambling machine, the reinforcement learning module comprising:

B10, the apparatus of B9, the linear model module to:

B11, the apparatus according to B10, the event characteristic data comprising at least one of:

B12, the apparatus of B9, the contextual gambling machine module to:

B13, the apparatus according to any of B8-12, wherein the first user characteristic sample data and/or the second user characteristic sample data are obtained by using a user profile model, and the user profile model is trained by using at least one of the following characteristic samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

B14, the device according to any one of B8-12, the device further comprising an event pre-estimation module for:

C15, a computer device comprising a processor and a memory:

the memory is for storing a program for executing the method of any one of C1 to C7,

the processor is configured to execute programs stored in the memory.

D16, a computer storage medium storing computer software instructions for use with the computer apparatus of C16 described above.

Claims

1. A data processing method, comprising:

2. The method of claim 1, wherein the reinforcement learning model includes a linear model and a context gambling machine, and the obtaining the predicted reward value and the predicted uncertainty value of each third user feature sample data by using the reinforcement learning model respectively comprises:

3. The method according to claim 2, wherein the obtaining the pre-estimated profit for each third user feature sample data by using the linear model respectively comprises:

4. The method of claim 3, wherein the event profile data comprises at least one of:

5. The method of claim 2, wherein the obtaining, using the contextual gambling machine, the predicted uncertainty value for each third user characteristic data separately comprises:

6. The method according to any of claims 1 to 5, wherein the first user feature sample data and/or the second user feature sample data is obtained using a user representation model trained using at least one of the following feature samples: pedestrian data, consumption data, call record data, geographic position data and app use condition data.

7. The method according to any one of claims 1 to 5, further comprising:

8. A data processing apparatus, comprising:

9. A computer device, comprising a processor and a memory:

the memory for storing a program for performing the method of any one of claims 1 to 7,

the processor is configured to execute programs stored in the memory.

10. A computer storage medium storing computer software instructions for use by the computer apparatus of claim 9.