CN112070593A

CN112070593A - Data processing method, device, equipment and storage medium

Info

Publication number: CN112070593A
Application number: CN202011054020.9A
Authority: CN
Inventors: 梁亮; 李婷姝
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2020-12-11
Anticipated expiration: 2040-09-29
Also published as: CN112070593B

Abstract

The application discloses a data processing method, a data processing device, electronic equipment and a storage medium, wherein a to-be-detected data matrix corresponding to a to-be-detected user is obtained, and the to-be-detected data matrix contains data which possibly influences whether the to-be-detected user agrees to a marketing campaign; respectively inputting the data matrix to be measured into a first sparse matrix model and a second sparse matrix model to respectively obtain a first prediction result and a second prediction result; and obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result. Because the obtained final result is obtained by combining the two prediction results and the weight values thereof, the prediction result output by the model is more directly taken as the final result and is more accurate. Staff in the financial industry can determine whether to carry out marketing on the user to be tested based on the final result, so that the staff in the financial industry does not carry out blind marketing any more, and the marketing success rate is improved.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of banking, and more particularly, to a data processing method, apparatus, device, and storage medium.

Background

When a financial industry (such as a bank) carries out a marketing activity, the financial industry blindly calls or sends short messages or mails to each user so that the user can know the marketing activity and can agree to purchase a product corresponding to the marketing activity or transact business corresponding to the marketing activity.

Disclosure of Invention

In view of this, the present application provides a data processing method, apparatus, device and storage medium, so as to overcome the problem of blind marketing in the prior art and improve the marketing success rate.

In order to achieve the above purpose, the present application provides the following technical solutions:

a method of data processing, comprising:

acquiring a data matrix to be tested corresponding to a user to be tested, wherein the data matrix to be tested comprises: at least one of the deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capacity of the user to be tested for the credit cards, second data representing running water of the user to be tested, the total loan number of the user to be tested, third data representing loan types of the user to be tested, the number of times of marketing for the user to be tested, and the number of times of successful marketing for the user to be tested;

inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times aiming at the user contained in one first sample data matrix is zero, and the number of marketing times aiming at the user contained in one first sample data matrix is less than or equal to a first preset value;

inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of successful marketing times for the user contained in one second sample data matrix is greater than zero, and the number of marketing times for the user contained in one second sample data matrix is less than or equal to the first preset value;

and obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

A data processing apparatus comprising:

the first acquisition module is used for acquiring a data matrix to be detected corresponding to a user to be detected, wherein the data matrix to be detected comprises: at least one of the deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capacity of the user to be tested for the credit cards, second data representing running water of the user to be tested, the total loan number of the user to be tested, third data representing loan types of the user to be tested, the number of times of marketing for the user to be tested, and the number of times of successful marketing for the user to be tested;

the first input module is used for inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times aiming at the user contained in one first sample data matrix is zero, and the number of marketing times aiming at the user contained in one first sample data matrix is less than or equal to a first preset value;

the second input module is used for inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of successful marketing times for the user contained in one second sample data matrix is greater than zero, and the number of marketing times for the user contained in one second sample data matrix is less than or equal to the first preset value;

a second obtaining module, configured to obtain a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result, and the second prediction result.

An electronic device, comprising:

a memory for storing a program;

a processor configured to execute the program, the program specifically configured to:

A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the data processing method according to any one of the preceding claims.

According to the technical scheme, compared with the prior art, the data processing method provided by the application obtains the data matrix to be tested corresponding to the user to be tested, wherein the data matrix to be tested comprises data which may influence whether the user to be tested agrees to the marketing campaign; inputting a data matrix to be measured into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model, since the first sparse matrix model is a first dictionary matrix composed of a plurality of first sample data matrices, obtaining a first sparse representation coefficient corresponding to the data matrix to be tested through a dictionary learning algorithm, calculating to obtain a first prediction result based on the first dictionary matrix and the first sparse representation coefficient, wherein the number of times of successful marketing of a user contained in each first sample data matrix is zero, therefore, each first sample data matrix is a failure sample, so the first sparse matrix model learns the attribute characteristics of the failure samples to a greater extent, that is, the first prediction result is obtained by the first sparse matrix model based on the attribute features of the failure-prone samples, that is, the first prediction result is more prone to failure marketing directions. Therefore, if the first prediction result is larger, the marketing success rate for the user to be tested in the actual marketing is larger; if the first prediction result is smaller, the result is prone to fail marketing directions and cannot indicate that the marketing success rate for the user to be tested in actual marketing is smaller, so that the second prediction result needs to be combined for judgment.

And inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model. Since the second sparse matrix model is a second dictionary matrix composed of a plurality of second sample data matrices, obtaining a second sparse representation coefficient corresponding to the data matrix to be tested through a dictionary learning algorithm, and calculating to obtain a second prediction result based on the second dictionary matrix and the second sparse representation coefficient, and each second sample data matrix contains a number of marketing successes for the user greater than zero, therefore, each second sample data matrix is a successful sample, so the second sparse matrix model learns the attribute characteristics of the successful sample to a greater extent, i.e. the second prediction result is obtained by the second sparse matrix model based on the attribute features of the samples that tend to be successful, the second prediction result is more inclined to the successful marketing direction, so that the smaller the second prediction result is, the smaller the marketing success rate for the user to be tested in the actual marketing is; if the second prediction result is larger, the result is prone to a successful marketing direction and cannot indicate that the marketing success rate for the user to be tested in actual marketing is larger, so that the judgment needs to be carried out by combining the first prediction result.

Therefore, based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result, the obtained final result is more accurate. Staff in the financial industry can determine whether to carry out marketing on the user to be tested based on the final result, so that the staff in the financial industry does not carry out blind marketing any more, and the marketing success rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is an architecture diagram of an implementation environment provided by an embodiment of the present application;

fig. 2 is a flowchart of an implementation manner of a data processing method provided in an embodiment of the present application;

FIG. 3 is a block diagram of an implementation of a data processing apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of an implementation manner of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Before describing the data processing method provided by the embodiment of the present application in detail, a brief description is given here to an implementation environment related to the embodiment of the present application.

Fig. 1 is a block diagram of an implementation environment according to an embodiment of the present disclosure. As shown in fig. 1, the following data processing method may be applied to the implementation environment, which includes: one or more terminal devices 11 and an electronic device 12.

The terminal device 11 may be any electronic product that can perform human-computer interaction with a user through one or more modes such as a keyboard, a touch pad, a touch screen, a remote controller, voice interaction, or handwriting equipment, for example, a mobile phone, a tablet computer, a palm computer, a personal computer, a wearable device, a smart television, a robot, and the like.

Fig. 1 is merely an example, the number of terminal devices 11 in practical application may be set according to practical requirements, and fig. 1 shows one terminal device 11.

Alternatively, the electronic device 12 may be any electronic product that can interact with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, a voice interaction device, or a handwriting device, for example, a mobile phone, a tablet computer, a palm computer, a personal computer, a wearable device, a smart television, a robot, and the like.

Optionally, the electronic device 12 may be a server, which may be one server, a server cluster composed of several servers, or a cloud computing service center.

Optionally, the terminal device 11 and the electronic device 12 are the same device; optionally, the terminal device 11 and the electronic device 12 are different devices.

The terminal device 11 is configured to acquire a data matrix to be measured corresponding to a user to be measured, and input the data matrix to be measured to the first sparse matrix model and the second sparse matrix model respectively to obtain a first prediction result and a second prediction result respectively; and obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

Illustratively, the terminal device 11 is further configured to send, to the electronic device 12, a request for establishing a communication connection with the user to be tested when the final result is greater than or equal to a second preset value.

Optionally, the terminal device 11 is provided with an application client, and the data matrix to be tested corresponding to the user to be tested may be obtained based on a user interface displayed by the application client.

Optionally, a browser client is installed on the terminal device 11, and the data matrix to be tested corresponding to the user to be tested may be obtained through a user interface of the web page version client displayed by the browser.

Illustratively, the client may be a bank client.

Exemplarily, the electronic device 12 is configured to train the first sparse matrix model and the second sparse matrix model based on the data processing method provided in the embodiment of the present application, and feed back the obtained prediction result to the terminal device 11.

Illustratively, the electronic device 12 is further configured to establish a communication connection with the user to be tested after receiving the request sent by the terminal device 11.

The following describes a data processing method, an apparatus, an electronic device, and a storage medium provided by the present application with reference to the above-described implementation environment.

As shown in fig. 2, a flowchart of an implementation manner of a data processing method provided in an embodiment of the present application is shown, where the method includes:

step S201: and acquiring a data matrix to be tested corresponding to the user to be tested.

Wherein, the data matrix to be tested comprises: the credit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capacity of the user to be tested for the credit cards, second data representing running water of the user to be tested, total loan number of the user to be tested, third data representing loan types of the user to be tested, marketing times of the user to be tested, and marketing success times of the user to be tested.

The above data will be explained below.

In an alternative implementation, the ability of a bank to pay a credit card for a user is measured primarily by five aspects: the total amount of assets, the total amount of payroll, the total amount of liabilities, the occupation status, and the credit record, so the first data representing the repayment ability of the user to be tested for the credit card may include at least one of the five aspects.

Wherein, the occupational status refers to employment or standby.

In an alternative implementation, the user's loan types are divided by credit into three categories: the credit loan 1, the guarantee loan 2 and the note discount 3, for example, the third data representing the loan type of the user to be tested is number data corresponding to the loan type, and if the loan type is a credit loan, the third data representing the loan type of the user to be tested is 1.

For example, the number of credit cards held by the user to be tested may be any integer such as 0, 1, 2, …, etc. The credit card held by the user to be tested can be the credit card of the same bank or the credit card of different banks.

For example, the deposit amount of the user to be tested may be the total deposit amount of the user to be tested in one account or a plurality of accounts.

The method for acquiring the data matrix to be detected corresponding to the user to be detected may be manually inputting the relevant data of the user to be detected, or may be acquiring the data matrix to be detected corresponding to the user to be detected from the bank counter system.

In an optional implementation manner, the user to be tested is a user who is not marketed, and the number of times of marketing for the user to be tested and the number of times of successful marketing for the user to be tested in the corresponding data matrix to be tested are both 0. The data processing method provided by the application needs to predict the marketing success probability of the user to be tested through the constructed sparse matrix model.

In an optional implementation manner, the user to be tested is a user who has already been marketed, but the number of marketed times is less than or equal to the first preset value (the number of successful marketing times for the user to be tested may be 0 or may not be 0). The data processing method provided by the application needs to predict the future marketing success probability of the user to be tested through the constructed sparse matrix model.

It can be understood that, on the premise that the number of times of marketing is less than or equal to the first preset value, even if the number of times of marketing success for the user to be tested is 0, the user to be tested may be marketed successfully in the future. If the marketing times are greater than the first preset value and the marketing success times of the user to be tested are 0, the probability of the future marketing success of the user to be tested is almost 0, and the marketing can be given up for the user.

For example, the first preset value may be determined based on actual conditions, and is not limited herein.

Step S202: and inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model.

The first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of times of marketing success for the user included in one first sample data matrix is zero, and the number of times of marketing for the user included in one first sample data matrix is smaller than or equal to a first preset value.

The Dictionary Learning (KSVD) algorithm is a method for data dimension reduction.

Assuming that one first sample data matrix is a column vector of M × 1 and M is any positive integer greater than or equal to 1, assuming that there are N first sample data matrices in total, then the first dictionary matrix is an M × N matrix.

The content of the first sample data matrix of one user is the same as that of the data matrix to be tested, and the first sample data matrix of one user comprises at least one of deposit amount of the user, the number of credit cards held by the user, first data representing repayment capacity of the user for the credit cards, second data representing running water of the user, total loan amount of the user, third data representing loan types of the user, marketing times of the user and marketing success times of the user.

In an optional embodiment, the acquired sample data matrixes corresponding to the multiple users can be preprocessed, and the sample data matrixes in the sample data matrixes, which are zero in number of successful marketing for the users and larger than a first preset value in number of marketing for the users, are removed; taking a sample data matrix which is zero in the number of successful marketing times of the user and is less than or equal to a first preset value in the number of marketing times of the user as a first sample data matrix; and taking the sample data matrix which is more than zero aiming at the successful marketing times of the user and less than or equal to the first preset value aiming at the marketing times of the user as a second sample data matrix.

For example, the first preset value may be determined according to an empirical value of marketing success.

It can be understood that, since the first sparse matrix model is a first prediction result calculated based on the plurality of first sample data matrices and the first sparse representation coefficient, the first prediction result may represent a probability of success of the first prediction marketing of the user to be tested.

Step S203: and inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model.

The second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; one said second sample data matrix contains a number of marketing successes for the user greater than zero, and one said second sample data matrix contains a number of marketing successes for the user less than or equal to said first preset value.

In an optional embodiment, the acquired sample data matrices corresponding to the multiple users may be preprocessed, and optionally, the sample data matrices, in which the number of times of marketing success for the users is greater than zero and the number of times of marketing for the users is greater than the first preset value, may also be used as second sample data matrices; for example, the first preset value may be determined according to an empirical value of marketing success.

It can also be understood that the second prediction result obtained after the data matrix to be tested is input into the second sparse matrix model is the second prediction marketing success probability representing the user to be tested.

Step S204: and obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

For example, a first weight value corresponding to the first sparse matrix model and a second weight value corresponding to the second sparse matrix model may be preset based on human experience.

The first weight value represents the degree that the first sample data matrix influences the disagreeable marketing of the user to be tested, and the second weight value represents the degree that the second sample data matrix influences the disagreeable marketing of the user to be tested.

The manner of obtaining the final result based on the first weight value, the second weight value, the first predicted result and the second predicted result includes, but is not limited to, weighted average, summation, averaging or a combination of the above calculation methods, for example, the final result may be calculated by the following formula:

the final result is the first weight value plus the first prediction result plus the second weight value.

Illustratively, the sum of the first weight value and the second weight value is 1.

The embodiment of the application provides a data processing method, which comprises the steps of firstly, obtaining a data matrix to be tested corresponding to a user to be tested, wherein the data matrix to be tested comprises data which may influence whether the user to be tested agrees to a marketing activity; inputting a data matrix to be measured into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model, since the first sparse matrix model is a first dictionary matrix composed of a plurality of first sample data matrices, obtaining a first sparse representation coefficient corresponding to the data matrix to be tested through a dictionary learning algorithm, calculating to obtain a first prediction result based on the first dictionary matrix and the first sparse representation coefficient, wherein the number of times of successful marketing of a user contained in each first sample data matrix is zero, therefore, each first sample data matrix is a failure sample, so the first sparse matrix model learns the attribute characteristics of the failure samples to a greater extent, that is, the first prediction result is obtained by the first sparse matrix model based on the attribute features of the failure-prone samples, that is, the first prediction result is more prone to failure marketing directions. Therefore, if the first prediction result is larger, the marketing success rate for the user to be tested in the actual marketing is larger; if the first prediction result is smaller, the result is prone to fail marketing directions and cannot indicate that the marketing success rate for the user to be tested in actual marketing is smaller, so that the second prediction result needs to be combined for judgment.

In an optional embodiment, if the final result is greater than or equal to a second preset value, a request is made to establish a communication connection with the user to be tested.

It can be understood that when the final result is greater than or equal to the second preset value, it indicates that the marketing success rate of the user to be tested is greater, and the marketing campaign is easier to be successful when the user to be tested is performed, so that the user is requested to establish a communication connection with the user to be tested, for example, one or more of communication modes such as making a call, sending a short message, sending an email, and performing an internet chat are adopted, so that the user knows the marketing campaign, and the probability that the user agrees to purchase a product corresponding to the marketing campaign or transact a service corresponding to the marketing campaign is increased. Therefore, according to the embodiment of the application, the marketing measure is taken only when the final result is judged to be greater than or equal to the second preset value, instead of establishing communication connection with the user to be tested blindly, the objection of the user to be tested is caused, and the marketing success rate is reduced.

In an optional implementation manner, the data processing method provided in the embodiment of the present application may be applied to a robot, and if the final result is greater than or equal to a second preset value, the robot may automatically establish a communication connection with a user to be tested.

In order to better understand the relevant contents of the embodiments of the present application, the following describes the principle of sparse representation and the process of constructing a sparse matrix model.

Sparse representation is the main information for representing an original signal with as few atoms as possible in a given overcomplete dictionary, i.e. most or all of the original signals are represented by fewer linear combinations of basic signals, so that the solving process of signal processing becomes simpler and easier.

Wherein the elementary signals are called atoms and are selected from a overcomplete dictionary; a dictionary is an ordered collection of atoms and can be considered as an N x T matrix, which is either a overcomplete dictionary or a redundant dictionary if T > N (number of columns > number of rows).

For example, find a coefficient matrix X_K×NAnd a dictionary matrix B_M×KSo that B X X restores Y (the original signal) as much as possible and X is as sparse as possible, then X is a sparse representation of Y.

To find the sparsest signal representation is equivalent to solving the following problem: min | | X | luminance₀Y.y ═ BX, where | | | X | | noncircular hair₀Is the number of non-zero entries in the coefficient matrix X. Since finding sparse extensions of a signal from a random redundant dictionary is an NP challenge, Terry has demonstrated that under certain conditions, the zero-norm problem is equivalent to the one-norm problem, and the above problem turns into: min | | X | luminance₁s.t.Y＝BX。

Thus not counting in min | | X | |₁The coefficient matrix X obtained under the constraint of s.t.y ═ BX is the sparse representation coefficient of the linear combination of atoms used to approximate the matrix Y in the dictionary matrix B.

In the following, a specific implementation process of inputting the data matrix to be tested to the first sparse matrix model in step S202 to obtain the first prediction result output by the first sparse matrix model in the embodiment of the present application is described in combination with the above principle, where the implementation process includes steps a1 to a 2.

Step A1: and obtaining a first sparse representation coefficient corresponding to the data matrix to be detected through the first sparse matrix model.

For the embodiment of the present application, it is assumed that a data matrix to be tested corresponding to a user to be tested is Y, and an M × 1 column vector, where the number of M corresponds to the number of data types included in the data matrix to be tested, and a numerical value corresponding to each row on the matrix Y is a specific value corresponding to a corresponding data type in the data to be tested.

Similarly, a sample user corresponds to a first sample data matrix, and if N sample users with zero marketing success times and M marketing times smaller than or equal to a first preset value exist, each first sample data matrix is an M × 1 column vector, N first sample data matrices corresponding to the N sample users form a first dictionary matrix, and the first dictionary matrix is recorded as a B matrix with M × N size (where N > M). As will be readily appreciated, M represents the number of data types and N represents the number of first sample data matrices in the dictionary matrix.

The first sparse matrix model is constructed by forming a first dictionary matrix by N first sample data matrixes, constructing a data matrix to be detected corresponding to a user to be detected as an original signal needing sparse representation, representing a first sparse representation coefficient by using a coefficient matrix X (a column vector of Nx 1), and the constraint condition of the first sparse representation coefficient X is min | | X | |₁s.t.y ═ BX. Wherein the coefficient matrix X ═ X₁,X₂,X₃,...,X_N]^T，X_iThe method refers to a sparse representation coefficient corresponding to the ith first sample data matrix, and the value of i ranges from 1 to N.

Assuming that N is 9, the 9 first sample data matrices are respectively a first sample data matrix 1, a first sample data matrix 2, a first sample data matrix 3, a first sample data matrix 4, a first sample data matrix 5, a first sample data matrix 6, a first sample data matrix 7, a first sample data matrix 8 and a first sample data matrix 9, and the 9 first sample data matrices form a first dictionary matrix; after the data matrix to be measured is input into the first sparse matrix model, a coefficient matrix X which can represent the data matrix to be measured in the first dictionary matrix is obtained for a first dictionary matrix formed by the 9 first sample data, wherein the coefficient matrix X is [0,0,5,0,2,0,0,0,3, 0]^TThis means that the sparse representation coefficient corresponding to the first sample data matrix 3 is 5, the sparse representation coefficient corresponding to the first sample data matrix 5 is 2, the sparse representation coefficient corresponding to the first sample data matrix 9 is 3, and the sparse representation coefficients corresponding to the other first sample data matrices are all 0, so that the first sample data matrix 3, the first sample data matrix 5, and the first sample data matrix 9 can be used to approximate the data matrix Y to be measured of the user to be measured, that is, Y is 5 × B₃+2×B₅+3×B₉。

Step A2: controlling the first sparse matrix model to obtain the first prediction result based on the first sparse representation coefficient.

Implementations of obtaining the first prediction result include the following steps C1 through C4.

Step C1: and obtaining the number of successful marketing aiming at the user and the number of marketing aiming at the user, which are contained in the first sample data matrix corresponding to the corresponding position in the first sparse representation coefficient, based on the first sparse representation coefficient of the data matrix to be tested of the user to be tested.

It should be noted that the number of marketing successes for the user in each first sample data matrix is zero.

Step C2: and solving the marketing times aiming at the user and contained in each first sample data matrix, and respectively multiplying the marketing times by the sum of the values of the corresponding positions in the first sparse representation coefficient to obtain the marketing times aiming at the user to be tested.

Step C3: and taking the sum of all elements in the first sparse representation coefficient as the number of successful marketing aiming at the user to be tested.

Since the number of times of successful marketing to the sample users included in the first sample data matrices of all the sample users in the first sample data matrices is zero, if the number of times of successful marketing to the user included in each first sample data matrix is multiplied by the value of the corresponding position in the first sparse representation coefficient respectively as the number of times of successful marketing to the user to be tested as in step C2, the probability of successful marketing to the user to be tested is calculated to be 0 regardless of the number of times of marketing to the user to be tested, and the probability of successful marketing cannot be predicted, so the sum of the first sparse representation coefficients is taken as the number of times of successful marketing to the user to be tested.

Step C4: and taking the ratio of the number of successful marketing times of the user to be tested to the number of marketed times of the user to be tested as a first prediction result.

For example, the first sparse representation coefficient obtained based on the data matrix to be measured of the user to be measured is X ═ 0,0,6,0,4,0,0,0,8]^TThat is, the expression of the data matrix to be tested of the user to be tested is Y ═ 6 × B₃+4×B₅+8×B₉Assuming that the first preset value is 5, B is obtained₃、B₅、B₉Respectively corresponding first sample data matrix, hypothesis, B₃＝[5000,5,1,4000,8000,2,3,0]，B₅＝[2000,2,0,5000,2000,1,2,0]，B₉＝[1000,4,1,700,5000,3,5,0]Therefore, the number of marketing times for the user to be measured is 6 × 3+4 × 2+8 × 5 — 66, and the sum of the number of successful marketing times for the user to be measured is 18, so that the first prediction result is 18/66 — 0.272.

The first numerical value in the first sample data matrix is a deposit amount, the second numerical value is the number of credit cards held, the third data is first data, the fourth data is the total number of loans, the fifth data is second data, the sixth data is third data, the seventh data is the number of marketing times for the sample user, and the eighth data is the number of successful marketing times for the sample user.

Correspondingly, the process of constructing the second sparse matrix model by the second sample data matrix is similar to the process of constructing the first sparse matrix model, except that the number of successful marketing times of the sample users corresponding to the second sample data matrix is greater than zero, and the number of marketing times is less than or equal to the first preset value. Therefore, the step S203 of inputting the data matrix to be measured to the second sparse matrix model to obtain the second prediction result output by the second sparse matrix model includes steps B1 to B2.

Step B1: and obtaining a second sparse representation coefficient corresponding to the data matrix to be measured through the second sparse matrix model.

Obtaining the second sparse representation coefficient corresponding to the data matrix to be measured by using the plurality of second sample data matrices is similar to step a1, and is not described herein again.

Step B2: controlling the second sparse matrix model to obtain the second prediction result based on the second sparse representation coefficient.

Implementations of obtaining the second prediction result include the following steps D1 through D3.

Step D1: and obtaining the number of successful marketing aiming at the user and the number of marketing aiming at the user, which are contained in a second sample data matrix of the corresponding position in the second sparse representation coefficient, based on a second sparse representation coefficient of the data matrix to be tested of the user to be tested.

Step D2: obtaining the number of times of successful marketing aiming at the user contained in each second sample data matrix and the sum of the values of the corresponding positions in the second sparse representation coefficients respectively to obtain the number of times of successful marketing aiming at the user to be tested; and obtaining the marketing times aiming at the user and contained in each second sample data matrix, and respectively multiplying the marketing times by the sum of the values of the corresponding positions in the second sparse representation coefficient to obtain the marketing times aiming at the user to be tested.

Step D3: and taking the ratio of the number of successful marketing times of the user to be tested to the number of marketed times of the user to be tested as a second prediction result.

For example, assume that the second sparse representation coefficient X based on the data matrix under test of the user under test is [0,0,6,0,4,0,0,0,8]^TThat is, the measured data matrix of the measured user is represented as Y ═ 6 × B₃+4×B₅+8×B₉Assuming that the first preset value is 5, B is obtained₃、B₅、B₉Respectively corresponding second sample data matrix, B₃＝[5000,5,1,4000,8000,2,3,2]，B₅＝[2000,2,0,5000,2000,1,2,1]，B₉＝[1000,4,1,700,5000,3,5,3]Therefore, the number of marketing times for the user to be tested is 6 × 3+4 × 2+8 × 5 ═ 66, the number of marketing successes for the user to be tested is 6 × 2+4 × 1+8 × 3 ═ 40, and therefore the second prediction result is 40/66 ═ 0.606.

The first numerical value in the second sample data matrix is the deposit amount, the second numerical value is the number of credit cards held, the third data is the first data, the fourth data is the total loan number, the fifth data is the second data, the sixth data is the third data, the seventh data is the number of marketing times for the sample user, and the eighth data is the number of successful marketing times for the sample user.

Or, the normalization processing is performed on 6, 4, and 8 to obtain the standard coefficients, for example, 6/(6+4+8) ═ 0.33,4/(6+4+8) ═ 0.22, and 8/(6+4+8) ═ 0.44, and then the number of marketing times for the user to be tested is calculated to be 0.33 × 3+0.22 × 2+0.44 ═ 3.63, and the number of marketing success times for the user to be tested is calculated to be 0.33 × 2+0.22 × 1+0.44 × 3 ═ 2.2, so that the second prediction result is 2.2/3.63 ═ 0.606.

In an alternative embodiment, the method for obtaining the first weight value includes steps E1 to E2.

Step E1: and taking the plurality of first sample data matrixes as independent variables, and taking actual marketing success probabilities respectively corresponding to the plurality of first sample data matrixes as dependent variables to obtain a first regression coefficient set representing the degree of influence of the independent variables on the dependent variables, wherein the first regression coefficient set comprises at least one first regression coefficient.

Step E2: and obtaining the first weight value based on the first regression coefficient set.

For the first sample data matrix, since the number of times of marketing success for the user included in one first sample data matrix is zero, the actual marketing success probability corresponding to one first sample data matrix is 0.

And taking the data type of each data in each first sample data matrix as an independent variable, and taking the corresponding actual marketing success probability as a dependent variable. Assuming that the relationship between a dependent variable and a plurality of independent variables is a multi-linear relationship, the dependent variable can be represented by a plurality of independent variables in a linear way, namely Y_θ(x)＝θ₀+θ₁x₁+θ₂x₂+...+θ_Nx_NWherein Y is_θ(x) Denotes a dependent variable, x_iDenotes the ith argument, θ_iAnd the first regression coefficient corresponding to the ith independent variable is represented, and the value of the first regression coefficient reflects the influence degree of the ith independent variable on the dependent variable.

In an alternative implementation, a regression model is used to obtain the first set of regression coefficients, and a specific example is described below.

Assuming that the first sample data matrix of one sample user includes any two data of the deposit amount of the sample user, the number of credit cards held by the sample user, first data representing repayment capacity of the sample user for credit cards, second data representing running water of the sample user, total loan number of the sample user, third data representing loan types of the sample user, times of marketing for the sample user, and times of marketing success for the sample user, the data types to which the 2 data belong are taken as 2 independent variables, actual marketing success probabilities corresponding to the first sample data matrices are taken as dependent variables, actual marketing success probabilities corresponding to the first sample data matrices are input into a regression model, and a first regression coefficient set θ can be obtained through calculation of the regression model.

It is assumed that the linear relationship between the two independent variables and the dependent variable is represented by Y_θ(x)＝θ₀+θ₁x₁+θ₂x₂＝θ^Tx, wherein θ ═ θ₀,θ₁,θ₂]^TIntroducing a loss function

To describe the degree of deviation of the dependent variable, Y_θ(x⁽ⁱ⁾) Is an estimated value of the marketing success probability corresponding to the ith first sample data matrix, P⁽ⁱ⁾For the actual value of the actual marketing success probability corresponding to the ith first sample data matrix, the first regression coefficient set θ needs to be adjusted to minimize J (θ), so that θ with the minimum J (θ) is the required first regression coefficient set.

For example, the method for obtaining the first regression coefficient set may further include: any one of least square method, gradient descent method, newton method, local linear weighting, ridge regression, LASSO regression.

After the first regression coefficient set is obtained, the first regression coefficient set can be subjected to standardization processing, so that the influence of dimensions can be ignored when data types to which the data belong are compared.

The normalization process for the first regression coefficient set is as follows: calculating the mean value and standard deviation of the same data type in a plurality of first sample data matrixes to obtain new data

Taking the obtained data as a new independent variable, obtaining new data corresponding to a plurality of data types respectively in the same way, substituting the new data into a regression model, wherein the obtained first regression coefficient is the normalized first regression coefficient,

means S representing the average of all data corresponding to the ith class of data type in the plurality of first sample data matrices_iAnd the standard deviation of a plurality of data corresponding to the ith type of data in the plurality of first sample data matrixes is shown. Optionally, the non-normalized first regression coefficient is substituted into the formula for calculation

The normalized first regression coefficient set can also be directly obtained.

For example, the method for obtaining the first weight value based on the first set of regression coefficients includes: and calculating the average value of all the normalized first regression coefficients contained in the normalized first regression coefficient set, and taking the average value as a first weight value.

Accordingly, the process of obtaining the second weight value includes steps F1 to F2.

Step F1: and taking the plurality of second sample data matrixes as independent variables, and taking actual marketing success probabilities respectively corresponding to the plurality of second sample data matrixes as dependent variables to obtain a second regression coefficient set representing the degree of influence of the independent variables on the dependent variables, wherein the second regression coefficient set comprises at least one second regression coefficient.

Step F2: and obtaining the second weight value based on the second regression coefficient set.

The process of obtaining the second weight value is the same as the process of obtaining the first weight value, and reference may be made to the process of obtaining the first weight value specifically, which is not described herein again.

In an optional embodiment, in an actual application, after marketing to a user to be tested, an actual result representing a marketing success probability for the user to be tested may be obtained, and the first weight value and the second weight value may be adjusted based on the actual result, where the specific process includes steps G1 to G3.

Step G1: and acquiring an actual result representing the marketing success probability aiming at the user to be tested.

Step G2: if the number of successful marketing times of the actual result representation for the user is zero, determining that the data matrix to be tested is the first sample data matrix aiming at the number of marketing times of the user which is less than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be tested and the actual result.

Step G3: if the number of successful marketing times of the actual result representation for the user is not zero, determining the data matrix to be tested as the second sample data matrix aiming at the marketing times of the user being less than or equal to a first preset value, and adjusting the second weight value based on the measured data matrix and the actual result.

Taking the adjustment of the first weight value as an example, a specific adjustment manner is described, and the adjustment of the second weight value is similar, which is not described herein again.

After an actual result representing the marketing success probability for the user to be tested is obtained, the actual number of times of marketing success for the user to be tested and the actual number of times of marketing for the user to be tested can be obtained.

If the actual number of successful marketing times for the user to be tested is 0 and the actual number of marketing times for the user to be tested is less than or equal to the first preset value, the data matrix to be tested is used as a first sample data matrix, and the first sample data matrix for learning is added to the first dictionary matrix.

Based on the constructed first sparse matrix model, due to the addition of the first sample data matrix, the first sparse representation coefficient output by the first sparse matrix model after the dictionary learning algorithm can better approximate to the data matrix to be detected of the next user to be detected, meanwhile, the mean value and the standard deviation of the same data type correspondingly have numerical changes, an updated first regression coefficient set can be obtained based on the changed mean value and standard deviation, and an updated first weighted value can be obtained based on the updated first regression coefficient set.

The method is described in detail in the embodiments disclosed in the present application, and the method of the present application can be implemented by various types of apparatuses, so that an apparatus is also disclosed in the present application, and the following detailed description is given of specific embodiments.

As shown in fig. 3, a block diagram of an implementation manner of a data processing apparatus provided in an embodiment of the present application is provided, where the apparatus includes:

a first obtaining module 31, configured to obtain a data matrix to be detected corresponding to a user to be detected, where the data matrix to be detected includes: at least one of the deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capacity of the user to be tested for the credit cards, second data representing running water of the user to be tested, the total loan number of the user to be tested, third data representing loan types of the user to be tested, the number of times of marketing for the user to be tested, and the number of times of successful marketing for the user to be tested;

a first input module 32, configured to input the data matrix to be measured to a first sparse matrix model, so as to obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times aiming at the user contained in one first sample data matrix is zero, and the number of marketing times aiming at the user contained in one first sample data matrix is less than or equal to a first preset value;

the second input module 33 is configured to input the data matrix to be measured to a second sparse matrix model, so as to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of successful marketing times for the user contained in one second sample data matrix is greater than zero, and the number of marketing times for the user contained in one second sample data matrix is less than or equal to the first preset value;

a second obtaining module 34, configured to obtain a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result, and the second prediction result.

Optionally, the method further includes:

and the request module is used for requesting to establish communication connection with the user to be tested if the final result is greater than or equal to a second preset value.

Optionally, the method further includes:

a third obtaining module, configured to use the multiple first sample data matrices as independent variables, use actual marketing success probabilities respectively corresponding to the multiple first sample data matrices as dependent variables, and obtain a first regression coefficient set representing a degree of influence of the independent variables on the dependent variables, where the first regression coefficient set includes at least one first regression coefficient;

a fourth obtaining module, configured to obtain the first weight value based on the first regression coefficient set;

optionally, the method further includes:

a fifth obtaining module, configured to use the plurality of second sample data matrices as independent variables, use actual marketing success probabilities respectively corresponding to the plurality of second sample data matrices as dependent variables, and obtain a second regression coefficient set representing degrees of influence of the independent variables on the dependent variables, where the second regression coefficient set includes at least one second regression coefficient;

a sixth obtaining module, configured to obtain the second weight value based on the second regression coefficient set.

Optionally, the method further includes:

a seventh obtaining module, configured to obtain an actual result representing a marketing success probability for the user to be tested;

the first adjusting module is used for determining the data matrix to be measured as the first sample data matrix if the actual result represents that the number of successful marketing times for the user is zero and the number of marketing times for the user is smaller than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be measured and the actual result;

and the second adjusting module is used for determining the data matrix to be tested as the second sample data matrix if the number of successful marketing times of the actual result representation for the user is not zero and the number of marketing times of the user is less than or equal to a first preset value, and adjusting the second weight value based on the measured data matrix and the actual result.

As shown in fig. 4, which is a structural diagram of an implementation manner of an electronic device provided in an embodiment of the present application, the electronic device includes:

a memory 41 for storing a program;

a processor 42 configured to execute the program, the program being specifically configured to:

The processor 42 may be a central processing unit CPU or an Application Specific Integrated Circuit (ASIC).

The first server may further comprise a communication interface 43 and a communication bus 44, wherein the memory 41, the processor 42 and the communication interface 43 are configured to communicate with each other via the communication bus 44.

The embodiment of the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps included in any of the embodiments of the data processing method described above.

Note that the features described in the embodiments in the present specification may be replaced with or combined with each other. For the device or system type embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data processing method, comprising:

2. The data processing method of claim 1, further comprising:

and if the final result is greater than or equal to a second preset value, requesting to establish communication connection with the user to be tested.

3. The data processing method according to claim 1 or 2, further comprising:

taking the plurality of first sample data matrixes as independent variables, and taking actual marketing success probabilities respectively corresponding to the plurality of first sample data matrixes as dependent variables to obtain a first regression coefficient set representing the degree of influence of the independent variables on the dependent variables, wherein the first regression coefficient set comprises at least one first regression coefficient;

and obtaining the first weight value based on the first regression coefficient set.

4. The data processing method according to claim 1 or 2, further comprising:

taking the plurality of second sample data matrixes as independent variables, and taking actual marketing success probabilities respectively corresponding to the plurality of second sample data matrixes as dependent variables to obtain a second regression coefficient set representing the degree of influence of the independent variables on the dependent variables, wherein the second regression coefficient set comprises at least one second regression coefficient;

and obtaining the second weight value based on the second regression coefficient set.

5. The data processing method of claim 4, further comprising:

acquiring an actual result representing the marketing success probability aiming at the user to be tested;

if the actual result represents that the number of successful marketing times for the user is zero, determining that the data matrix to be tested is the first sample data matrix aiming at the number of marketing times of the user which is less than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be tested and the actual result;

if the number of successful marketing times of the actual result representation for the user is not zero, determining the data matrix to be tested as the second sample data matrix aiming at the marketing times of the user being less than or equal to a first preset value, and adjusting the second weight value based on the measured data matrix and the actual result.

6. A data processing apparatus comprising:

7. The data processing apparatus of claim 6, further comprising:

8. The data processing apparatus according to claim 6 or 7, further comprising:

9. An electronic device, comprising:

a memory for storing a program;

10. A storage medium having stored thereon a computer program for implementing the steps of the data processing method according to any one of claims 1 to 5 when executed by a processor.