CN112070593B

CN112070593B - Data processing method, device, equipment and storage medium

Info

Publication number: CN112070593B
Application number: CN202011054020.9A
Authority: CN
Inventors: 梁亮; 李婷姝
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2023-09-05
Anticipated expiration: 2040-09-29
Also published as: CN112070593A

Abstract

The application discloses a data processing method, a device, electronic equipment and a storage medium, wherein a data matrix to be detected corresponding to a user to be detected is obtained, and the data matrix to be detected contains data which can influence whether the user to be detected agrees with marketing activities or not; respectively inputting a data matrix to be tested into a first sparse matrix model and a second sparse matrix model to respectively obtain a first prediction result and a second prediction result; and obtaining a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result. Because the obtained final result is obtained by combining the two prediction results and the weight values thereof, the method is more accurate than the method of directly taking the prediction result output by the model as the final result. And a staff in the financial industry can determine whether to market the user to be measured based on the final result, so that the staff in the financial industry can not blindly market any more, and the marketing success rate is improved.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of banking, and more particularly, to a data processing method, apparatus, device, and storage medium.

Background

When the financial industry (such as a bank) performs a marketing campaign, the financial industry blindly dials a call or sends a short message or sends a mail to each user, so that the user can learn the marketing campaign, and the user can agree to purchase a product corresponding to the marketing campaign or transact a service corresponding to the marketing campaign.

Disclosure of Invention

In view of the above, the present application provides a data processing method, apparatus, device and storage medium, so as to overcome the problem of blind marketing in the prior art and improve the success rate of marketing.

In order to achieve the above purpose, the present application provides the following technical solutions:

a data processing method, comprising:

obtaining a data matrix to be tested corresponding to a user to be tested, wherein the data matrix to be tested comprises: at least one of deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capability of the user to be tested for the credit cards, second data representing running water of the user to be tested, total loan number of the user to be tested, third data representing loan type of the user to be tested, marketing times for the user to be tested and successful marketing times for the user to be tested;

Inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be detected is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times for the user contained in one first sample data matrix is zero, and the number of marketing times for the user contained in one first sample data matrix is smaller than or equal to a first preset value;

inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of times that the second sample data matrix contains marketing success aiming at the user is greater than zero, and the number of times that the second sample data matrix contains marketing aiming at the user is less than or equal to the first preset value;

And obtaining a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

A data processing apparatus comprising:

the first acquisition module is used for acquiring a data matrix to be detected corresponding to a user to be detected, and the data matrix to be detected comprises: at least one of deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capability of the user to be tested for the credit cards, second data representing running water of the user to be tested, total loan number of the user to be tested, third data representing loan type of the user to be tested, marketing times for the user to be tested and successful marketing times for the user to be tested;

the first input module is used for inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be detected is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times for the user contained in one first sample data matrix is zero, and the number of marketing times for the user contained in one first sample data matrix is smaller than or equal to a first preset value;

The second input module is used for inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of times that the second sample data matrix contains marketing success aiming at the user is greater than zero, and the number of times that the second sample data matrix contains marketing aiming at the user is less than or equal to the first preset value;

the second obtaining module is configured to obtain a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

An electronic device, comprising:

a memory for storing a program;

a processor, configured to execute the program, where the program is specifically configured to:

A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method according to any of the preceding claims.

According to the technical scheme, compared with the prior art, in the data processing method provided by the application, the data matrix to be detected corresponding to the user to be detected is obtained, and the data matrix to be detected contains data which can influence whether the user to be detected agrees with the marketing activity; inputting a data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model, wherein the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrices, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient, the number of times of successful marketing aiming at a user contained in each first sample data matrix is zero, each first sample data matrix is a failure sample, so that the first sparse matrix model learns attribute features of the failure samples to a greater extent, namely the obtained first prediction result is obtained by the first sparse matrix model based on the attribute features of the failure samples, namely the first prediction result is more prone to the failure marketing direction. Therefore, if the first prediction result is larger, the marketing success rate aiming at the user to be measured in the actual marketing is larger; if the first prediction result is smaller, the result is in a failure marketing direction, and cannot indicate that the marketing success rate for the user to be tested is smaller in actual marketing, so that the second prediction result needs to be combined for judgment.

And inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model. The second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient, and the number of times of marketing success aiming at a user contained in each second sample data matrix is larger than zero, so that each second sample data matrix is a successful sample, the second sparse matrix model learns attribute characteristics of the successful sample to a greater extent, namely the obtained second prediction result is obtained by the second sparse matrix model based on the attribute characteristics of the successful sample, namely the second prediction result is more prone to a successful marketing direction, and therefore, the smaller the second prediction result is, the smaller the marketing success rate aiming at the user to be tested in actual marketing is indicated; if the second prediction result is larger, the result is in a successful marketing direction, and cannot indicate that the marketing success rate of the user to be tested is larger in actual marketing, so that the first prediction result is also needed to be combined for judgment.

Therefore, based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result, the obtained final result is more accurate. And a staff in the financial industry can determine whether to market the user to be measured based on the final result, so that the staff in the financial industry can not blindly market any more, and the marketing success rate is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flow chart of an implementation of a data processing method according to an embodiment of the present application;

FIG. 3 is a block diagram of one implementation of a data processing apparatus according to an embodiment of the present application;

Fig. 4 is a block diagram of an implementation manner of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Before describing the data processing method provided by the embodiment of the present application in detail, the implementation environment related to the embodiment of the present application will be briefly described here.

Fig. 1 is a schematic diagram of an implementation environment according to an embodiment of the present application. As shown in fig. 1, the following data processing method may be applied in the implementation environment, where the implementation environment includes: one or more terminal devices 11 and an electronic device 12.

The terminal device 11 may be any electronic product that can perform man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, a voice interaction or a handwriting device, for example, a mobile phone, a tablet computer, a palm computer, a personal computer, a wearable device, a smart television, a robot, etc.

Fig. 1 is merely an example, and the number of terminal devices 11 in practical applications may be set according to practical requirements, and one terminal device 11 is shown in fig. 1.

Alternatively, the electronic device 12 may be any electronic product that can perform man-machine interaction with a user through one or more of a keyboard, a touchpad, a touch screen, a remote controller, a voice interaction or handwriting device, such as a mobile phone, a tablet computer, a palm top computer, a personal computer, a wearable device, a smart television, a robot, etc.

Alternatively, the electronic device 12 may be a server, which may be a server, a server cluster including several servers, or a cloud computing service center.

Optionally, the terminal device 11 and the electronic device 12 are the same device; alternatively, the terminal device 11 and the electronic device 12 are different devices.

The terminal device 11 is configured to obtain a data matrix to be tested corresponding to a user to be tested, and input the data matrix to be tested into a first sparse matrix model and a second sparse matrix model respectively to obtain a first prediction result and a second prediction result respectively; and obtaining a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

The terminal device 11 is further configured to send a request for establishing a communication connection with the user to be tested to the electronic device 12 when the final result is greater than or equal to a second preset value.

Optionally, an application client is installed on the terminal device 11, and the to-be-measured data matrix corresponding to the to-be-measured user can be obtained based on a user interface displayed by the application client.

Optionally, a browser client is installed on the terminal device 11, and the data matrix to be tested corresponding to the user to be tested can be obtained through a user interface of the webpage client displayed by the browser.

The client may be a bank client, for example.

The electronic device 12 is configured to train the first sparse matrix model and the second sparse matrix model according to the data processing method provided by the embodiment of the present application, and feedback the obtained prediction result to the terminal device 11.

The electronic device 12 is further configured to establish a communication connection with the user to be tested after receiving the request sent by the terminal device 11.

The data processing method, apparatus, electronic device and storage medium provided by the present application are described below in connection with the above implementation environment.

As shown in fig. 2, a flowchart of an implementation manner of a data processing method according to an embodiment of the present application is shown, where the method includes:

Step S201: and acquiring a data matrix to be tested corresponding to the user to be tested.

Wherein the data matrix to be measured includes: the credit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capability of the user to be tested for the credit cards, second data representing running water of the user to be tested, the total loan number of the user to be tested, third data representing loan types of the user to be tested, the marketing times of the user to be tested and the successful marketing times of the user to be tested.

The above data will be described below.

In an alternative implementation, the ability of a bank to pay a user for a credit card is measured primarily in five ways: the total amount of assets, total amount of payroll, total amount of liabilities, professional status, credit records, the first data characterizing the payoff capability of the user under test for a credit card may include at least one of these five aspects.

The professional condition refers to employment or backlog.

In an alternative implementation, the loan types of the users are divided into three by credit: credit 1, guarantee 2 and bill impression 3, and for example, the third data representing the type of the loan of the user to be tested is the number data corresponding to the type of the loan, if the type of the loan is credit, then the third data representing the type of the loan of the user to be tested is 1.

Illustratively, the number of credit cards held by the user to be tested may be any integer of 0, 1, 2, …, etc. The credit card held by the user to be tested can be the credit card of the same bank or the credit card of different banks.

For example, the deposit amount of the user under test may be the total amount of deposit of the user under test in the account or accounts.

The mode of acquiring the data matrix to be measured corresponding to the user to be measured can be to manually input the related data of the user to be measured, or can be to acquire the data matrix to be measured corresponding to the user to be measured from a bank counter system.

In an optional implementation manner, the user to be measured is a user which is not marketed, and the number of times of marketing aiming at the user to be measured and the number of times of successful marketing aiming at the user to be measured in the corresponding data matrix to be measured are both 0. The data processing method provided by the application is to predict the probability of successful marketing of the user to be tested through the constructed sparse matrix model.

In an alternative implementation, the user to be measured is a user who has been marketed, but the number of times that the user to be measured is marketed is less than or equal to a first preset value (the number of times that the user to be measured is marketed may or may not be 0). The data processing method provided by the application is to predict the future marketing success probability of the user to be tested through the constructed sparse matrix model.

It can be appreciated that, on the premise that the number of times of being marketed is less than or equal to the first preset value, even if the number of times of successful marketing for the user to be tested is 0, the user to be tested may be successfully marketed in the future. If the number of times to be marketed is greater than the first preset value and the number of times of successful marketing for the user to be tested is 0, the probability of successful marketing for the user to be tested in the future is almost 0, and marketing can be abandoned for such user.

The first preset value may be based on actual conditions, and is not limited herein.

Step S202: and inputting the data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model.

The first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of times the first sample data matrix contains marketing success aiming at the user is zero, and the number of times the first sample data matrix contains marketing aiming at the user is smaller than or equal to a first preset value.

Dictionary learning (Dictionary Learning, KSVD) algorithm is a method for data dimension reduction.

Assuming that one first sample data matrix is a column vector of mx 1, M is any positive integer greater than or equal to 1, and assuming that there are N first sample data matrices in total, the first dictionary matrix is a matrix of mx N.

The first sample data matrix of one user contains the same content as the data matrix to be measured, and the first sample data matrix of one user comprises at least one of deposit amount of the user, the number of credit cards held by the user, first data representing repayment capability of the user for the credit cards, second data representing running water of the user, total loan number of the user, third data representing loan type of the user, marketing times aiming at the user to be measured and successful marketing times aiming at the user.

In an optional embodiment, the sample data matrix corresponding to the obtained multiple users may be preprocessed, where the number of times of marketing success for the users in the sample data matrix is zero, and the sample data matrix with the number of times of marketing for the users being greater than a first preset value is removed; taking a sample data matrix with the successful marketing frequency of the user being zero and the marketing frequency of the user being less than or equal to a first preset value as a first sample data matrix; and taking the sample data matrix with the marketing success times for the user being greater than zero and the marketing times for the user being less than or equal to the first preset value as a second sample data matrix.

For example, the first preset value may be determined according to an empirical value of marketing success.

It can be appreciated that, since the first sparse matrix model is a first prediction result calculated based on the plurality of first sample data matrices and the first sparse representation coefficient, the first prediction result may represent a probability of success of first prediction marketing of the user to be tested.

Step S203: and inputting the data matrix to be tested into a second sparse matrix model to obtain a second prediction result output by the second sparse matrix model.

The second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of times the second sample data matrix contains marketing success aiming at the user is larger than zero, and the number of times the second sample data matrix contains marketing aiming at the user is smaller than or equal to the first preset value.

In an optional embodiment, the obtained sample data matrices corresponding to the plurality of users may be preprocessed, and optionally, a sample data matrix with the number of successful marketing times for the users being greater than zero and the number of marketing times for the users being greater than the first preset value may be used as a second sample data matrix; for example, the first preset value may be determined according to an empirical value of marketing success.

It can also be understood that the second prediction result obtained after the data matrix to be measured is input to the second sparse matrix model is a second prediction marketing success probability representing the user to be measured.

Step S204: and obtaining a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result.

For example, a first weight value corresponding to the first sparse matrix model and a second weight value corresponding to the second sparse matrix model may be preset based on human experience.

The first weight value represents the degree to which the first sample data matrix affects the user to be tested to disagree with marketing, and the second weight value represents the degree to which the second sample data matrix affects the user to be tested to disagree with marketing.

The manner of obtaining the final result based on the first weight value, the second weight value, the first predicted result and the second predicted result includes, but is not limited to, weighted average, summation, averaging, or a combination of the above calculation methods, and the final result may be calculated by, for example, the following formula:

final result = first weight value first predictor + second weight value second predictor.

Illustratively, the sum of the first weight value and the second weight value is 1.

The embodiment of the application provides a data processing method, which comprises the steps of firstly, obtaining a data matrix to be detected corresponding to a user to be detected, wherein the data matrix to be detected contains data which can influence whether the user to be detected agrees with a marketing activity; inputting a data matrix to be tested into a first sparse matrix model to obtain a first prediction result output by the first sparse matrix model, wherein the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrices, a first sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient, the number of times of successful marketing aiming at a user contained in each first sample data matrix is zero, each first sample data matrix is a failure sample, so that the first sparse matrix model learns attribute features of the failure samples to a greater extent, namely the obtained first prediction result is obtained by the first sparse matrix model based on the attribute features of the failure samples, namely the first prediction result is more prone to the failure marketing direction. Therefore, if the first prediction result is larger, the marketing success rate aiming at the user to be measured in the actual marketing is larger; if the first prediction result is smaller, the result is in a failure marketing direction, and cannot indicate that the marketing success rate for the user to be tested is smaller in actual marketing, so that the second prediction result needs to be combined for judgment.

In an alternative embodiment, if the final result is greater than or equal to a second preset value, a communication connection is requested to be established with the user to be tested.

It can be appreciated that when the final result is greater than or equal to the second preset value, the marketing success rate of the user to be measured is indicated to be greater, and the user to be measured is easier to succeed in conducting the marketing campaign on the user to be measured, so that the user is requested to establish communication connection with the user to be measured, for example, one or more of calling, sending a short message, sending a mail, and performing network chat are adopted, so that the user can learn about the marketing campaign, and the probability that the user agrees to purchase a product corresponding to the marketing campaign or transact a service corresponding to the marketing campaign is increased. Therefore, in the embodiment of the application, the marketing measures are taken only when the final result is judged to be greater than or equal to the second preset value, instead of blindly establishing communication connection with the user to be tested, the objection of the user to be tested is caused, and the marketing success rate is reduced.

In an optional implementation manner, the data processing method provided by the embodiment of the application can be applied to a robot, and if the final result is greater than or equal to a second preset value, the robot can automatically establish communication connection with a user to be tested.

In order to better understand the relevant content of the embodiments of the present application, the principles of sparse representation and the process of constructing a sparse matrix model are described below.

Sparse representation is the main information of representing one original signal with as few atoms as possible in a given overcomplete dictionary, i.e., using linear combinations of fewer basic signals to express most or all of the original signals, thereby making the solution process of signal processing simpler and easier.

Wherein the basis signals, called atoms, are selected from an overcomplete dictionary; a dictionary is an ordered set of atoms, which can be considered as an N x T matrix, if T > N (number of columns > number of rows), then the dictionary is an overcomplete dictionary or a redundant dictionary.

For example, find a coefficient matrix X _K×N One dictionary matrix B _M×K So that B X restores Y (original signal) as much as possible and X is sparse as much as possible, then X is a sparse representation of Y.

To find the sparsest signal representation is equivalent to solving the following problems: min X ₀ s.t.y=bx, where i X i ₀ Is the number of non-zero entries in coefficient matrix X. Since finding sparse extensions of signals from a random redundant dictionary is an NP-hard problem, terry has proven that under certain conditions, the zero-norm problem is equivalent to a one-norm problem, and the above problem translates into: min X ₁ s.t.Y＝BX。

Thus in min X ₁ And (3) obtaining a coefficient matrix X under the constraint condition of t.Y=BX, wherein the coefficient matrix X is a sparse representation coefficient of a linear combination of atoms adopted by an approximation matrix Y in a dictionary matrix B.

The following describes a specific implementation process of inputting the data matrix to be tested to the first sparse matrix model in step S202 to obtain the first prediction result output by the first sparse matrix model in the embodiment of the present application in combination with the above principle, where the implementation process includes steps A1 to A2.

Step A1: and obtaining a first sparse representation coefficient corresponding to the data matrix to be detected through the first sparse matrix model.

For the embodiment of the present application, it is assumed that the data matrix to be measured corresponding to the user to be measured is Y, an m×1 column vector, where the number of M corresponds to the number of data types included in the data matrix to be measured, and the numerical value corresponding to each row on the matrix Y is a specific value corresponding to the corresponding data type in the data to be measured.

Similarly, if one sample user corresponds to a first sample data matrix, and if N sample users with the number of successful marketing times being zero and the number of marketing times being less than or equal to a first preset value and the number of data types of the N users being the same as the number of data types of the user to be measured are M, each first sample data matrix is an m×1 column vector, N first sample data matrices corresponding to the N sample users form a first dictionary matrix, and the first dictionary matrix is marked as a B matrix with a size of m×n (where N > M). It is readily understood that M represents the number of data types and N represents the number of first sample data matrices in the dictionary matrix.

The first sparse matrix model is constructed by using N first sample data matrixes to form a first dictionary matrix, the data matrix to be measured corresponding to a user to be measured is used as an original signal needing sparse representation, the first sparse representation coefficient is represented by a coefficient matrix X (N X1 column vector), and the first sparse representation coefficient X the constraint of (2) is min X|| ₁ s.t.y=bx. Wherein the coefficient matrix x= [ X ] ₁ ,X ₂ ,X ₃ ,...,X _N ] ^T ，X _i The sparse representation coefficient corresponding to the ith first sample data matrix is indicated, and the value of i is from 1 to N.

Assuming that n=9, 9 first sample data matrices are respectively a first sample data matrix 1, a first sample data matrix 2, a first sample data matrix 3, a first sample data matrix 4, a first sample data matrix 5, a first sample data matrix 6, a first sample data matrix 7, a first sample data matrix 8, a first sample data matrix 9, the 9 first sample data matrices constituting a first dictionary matrix; after inputting the data matrix to be measured into the first sparse matrix model, aiming at the first dictionary matrix composed of the 9 first sample data, obtaining a coefficient matrix X= [0,0,5,0,2,0,0,0,3 ] capable of representing the data matrix to be measured in the first dictionary matrix] ^T This means that the sparse representation coefficient corresponding to the first sample data matrix 3 is 5, the sparse representation coefficient corresponding to the first sample data matrix 5 is 2, the sparse representation coefficient corresponding to the first sample data matrix 9 is 3, and the sparse representation coefficients corresponding to the other first sample data matrices are all 0, so that the first sample data matrix 3, the first sample data matrix 5 and the first sample data matrix 9 can be used to approximate the data matrix Y to be measured of the user to be measured, i.e. y=5×b ₃ +2×B ₅ +3×B ₉ 。

Step A2: and controlling the first sparse matrix model to obtain the first prediction result based on the first sparse representation coefficient.

The implementation manner of obtaining the first prediction result includes the following steps C1 to C4.

Step C1: and obtaining the number of successful marketing aiming at the user and the number of marketing aiming at the user, which are contained in the first sample data matrix corresponding to the corresponding position in the first sparse representation coefficient, based on the first sparse representation coefficient of the data matrix to be measured of the user to be measured.

It should be noted that the number of successful marketing times for the user in each first sample data matrix is zero.

Step C2: and obtaining the sum of the marketed times for the user, which are contained in each first sample data matrix, multiplied by the values of the corresponding positions in the first sparse representation coefficient respectively, so as to obtain the marketed times for the user to be tested.

Step C3: and taking the sum of elements in the first sparse representation coefficient as the number of successful marketing times aiming at the user to be tested.

Because the number of successful marketing times for the sample users contained in the first sample data matrix of all sample users in the first sample data matrix is zero, if the number of successful marketing times for the users contained in each first sample data matrix is multiplied by the value of the corresponding position in the first sparse representation coefficient as the number of successful marketing times for the user to be tested as in the step C2, the probability of successful marketing for the user to be tested is calculated to be 0 no matter how many times the user to be tested is marketed, and the effect of predicting the probability of successful marketing times is not achieved.

Step C4: and taking the ratio of the number of successful marketing times aiming at the user to be measured to the number of marketing times aiming at the user to be measured as a first prediction result.

For example, the first sparse representation coefficient obtained based on the data matrix to be measured of the user to be measured is x= [0,0,6,0,4,0,0,0,8 ]] ^T I.e. the matrix expression of the data to be measured of the user to be measured is y=6×b ₃ +4×B ₅ +8×B ₉ Assuming that the first preset value is 5, obtaining B ₃ 、B ₅ 、B ₉ Respectively corresponding first sample data matrix, assume, B ₃ ＝[5000,5,1,4000,8000,2,3,0]，B ₅ ＝[2000,2,0,5000,2000,1,2,0]，B ₉ ＝[1000,4,1,700,5000,3,5,0]Therefore, the number of times of marketing for the user to be measured is 6×3+4×2+8×5=66, and the sum of the numbers of times of successful marketing for the user to be measured is 18, so the first prediction result is 18/66=0.272.

Wherein, the first value in the first sample data matrix is deposit amount, the second value is credit card number held, the third data is first data, the fourth data is loan total number, the fifth data is second data, the sixth data is third data, the seventh data is marketing times aiming at sample users, and the eighth data is successful marketing times aiming at sample users.

Correspondingly, the process of constructing the second sparse matrix model through the second sample data matrix is similar to the process of constructing the first sparse matrix model, except that the number of times of successful marketing of the sample users corresponding to the second sample data matrix is greater than zero, and the number of times of marketing is smaller than or equal to the first preset value. Therefore, in step S203, the data matrix to be measured is input to the second sparse matrix model, and the second prediction result output by the second sparse matrix model is obtained, which includes steps B1 to B2.

Step B1: and obtaining a second sparse representation coefficient corresponding to the data matrix to be measured through the second sparse matrix model.

The second sparse representation coefficient corresponding to the data matrix to be measured obtained by using the plurality of second sample data matrices is similar to that obtained in the step A1, and will not be described herein.

Step B2: and controlling the second sparse matrix model to obtain the second prediction result based on the second sparse representation coefficient.

The implementation of obtaining the second prediction result includes the following steps D1 to D3.

Step D1: and obtaining the number of successful marketing aiming at the user and the number of marketing aiming at the user, which are contained in a second sample data matrix at the corresponding position in the second sparse representation coefficient, based on the second sparse representation coefficient of the data matrix to be measured of the user to be measured.

Step D2: obtaining the number of successful marketing times for the user to be tested by multiplying the number of successful marketing times for the user contained in each second sample data matrix by the sum of values of corresponding positions in the second sparse representation coefficient respectively; and obtaining the sum of the marketed times for the user, which are contained in each second sample data matrix, multiplied by the values of the corresponding positions in the second sparse representation coefficients respectively, so as to obtain the marketed times for the user to be tested.

Step D3: and taking the ratio of the number of times of marketing success aiming at the user to be measured to the number of times of marketing aiming at the user to be measured as a second prediction result.

For example, assume that a second sparse representation coefficient x= [0,0,6,0,4,0,0,0,8 ] of a to-be-measured data matrix based on a to-be-measured user] ^T I.e. the matrix of data to be measured for the user to be measured is denoted y=6×b ₃ +4×B ₅ +8×B ₉ Assuming that the first preset value is 5, obtaining B ₃ 、B ₅ 、B ₉ Respectively corresponding second sample data matrix, B ₃ ＝[5000,5,1,4000,8000,2,3,2]，B ₅ ＝[2000,2,0,5000,2000,1,2,1]，B ₉ ＝[1000,4,1,700,5000,3,5,3]Therefore, the number of times of marketing for the user to be measured is 6×3+4×2+8×5=66, and the number of times of successful marketing for the user to be measured is 6×2+4×1+8×3=40, so the second prediction result is 40/66=0.606.

Wherein, the first value in the second sample data matrix is deposit amount, the second value is credit card number held, the third data is first data, the fourth data is loan total number, the fifth data is second data, the sixth data is third data, the seventh data is marketing times aiming at sample users, and the eighth data is successful marketing times aiming at sample users.

Or, normalization processing is performed on 6, 4, and 8 to obtain a standard coefficient, for example, 6/(6+4+8) = 0.33,4/(6+4+8) = 0.22,8/(6+4+8) =0.44, and then marketing times for the user to be measured are calculated to be 0.33×3+0.22×2+0.44×5=3.63, and marketing times for the user to be measured are 0.33×2+0.22×1+0.44×3=2.2, so that the second prediction result is 2.2/3.63=0.606.

In an alternative embodiment, the method for obtaining the first weight value includes steps E1 to E2.

Step E1: and taking the plurality of first sample data matrixes as independent variables, taking actual marketing success probabilities respectively corresponding to the plurality of first sample data matrixes as dependent variables, and obtaining a first regression coefficient set representing the influence degree of the independent variables on the dependent variables, wherein the first regression coefficient set comprises at least one first regression coefficient.

Step E2: and obtaining the first weight value based on the first regression coefficient set.

For the first sample data matrix, since the number of successful marketing efforts for the user contained in one of the first sample data matrices is zero, the actual probability of successful marketing efforts corresponding to one of the first sample data matrices is 0.

And taking the data type of each data in each first sample data matrix as an independent variable and the corresponding actual marketing success probability as a dependent variable. Assuming that the relationship between a dependent variable and independent variables is a multiple linear relationship, the dependent variable may be derived from multiple independent variable linear representations, i.e., Y _θ (x)＝θ ₀ +θ ₁ x ₁ +θ ₂ x ₂ +...+θ _N x _N Wherein Y is _θ (x) Represents a dependent variable, x _i Represents the ith argument, θ _i And the first regression coefficient corresponding to the ith independent variable is represented, and the value of the first regression coefficient represents the influence degree of the ith independent variable on the dependent variable.

In an alternative implementation, a regression model is used to obtain the first set of regression coefficients, as described below in a specific example.

Assuming that the first sample data matrix of one sample user includes any two data of deposit amount of the sample user, credit card number held by the sample user, first data representing repayment capability of the sample user for the credit card, second data representing flowing water of the sample user, total loan number of the sample user, third data representing loan type of the sample user, marketing times of the sample user and marketing success times of the sample user, wherein the data types of the 2 data are taken as 2 independent variables, actual marketing success probability corresponding to each first sample data matrix is taken as a dependent variable, and the actual marketing success probability P corresponding to each first sample data matrix and each first sample data matrix is input into a regression model, and a first regression coefficient set theta can be obtained through calculation of the regression model.

Assume that the linear relationship between the two independent variables and the dependent variable is expressed as Y _θ (x)＝θ ₀ +θ ₁ x ₁ +θ ₂ x ₂ ＝θ ^T x, wherein θ= [ θ ] ₀ ,θ ₁ ,θ ₂ ] ^T Introducing a loss functionTo describe the degree of deviation of the dependent variable, Y _θ (x ⁽ⁱ⁾ ) For the estimated value of the marketing success probability corresponding to the ith first sample data matrix, P ⁽ⁱ⁾ For the actual value of the actual marketing success probability corresponding to the ith first sample data matrix, the first regression coefficient set θ needs to be adjusted to make J (θ) minimum, so that θ with the minimum J (θ) is the first regression coefficient set.

For example, the method for obtaining the first regression coefficient set may further be: any one of least square method, gradient descent method, newton method, local linear weighting, ridge regression, LASSO regression.

After the first regression coefficient set is obtained, the first regression coefficient set can be subjected to standardization processing, so that the influence of dimension can be ignored when the data types of the data are compared.

The standardized processing procedure for the first regression coefficient set is as follows: calculating the mean value and standard deviation of the same data type in a plurality of first sample data matrixes to obtain new dataTaking the data as a new independent variable, and similarly obtaining new data corresponding to a plurality of data types respectively, substituting the new data into a regression model, wherein the first regression coefficient obtained at the moment is the normalized first regression coefficient,' >Representing an average value of all data corresponding to the i-th type of data in a plurality of first sample data matrices, S _i Representing standard deviations of a plurality of data corresponding to the i-th type of data in a plurality of first sample data matrices. Alternatively, the first regression coefficient, which is not normalized, is substituted into the formula to calculate +.>The normalized first regression coefficient set may also be obtained directly.

Illustratively, the method for obtaining the first weight value based on the first regression coefficient set includes: and averaging all the normalized first regression coefficients contained in the normalized first regression coefficient set, and taking the average value as a first weight value.

Correspondingly, the process of acquiring the second weight value comprises steps F1 to F2.

Step F1: and taking the plurality of second sample data matrixes as independent variables, taking actual marketing success probabilities respectively corresponding to the plurality of second sample data matrixes as dependent variables, and obtaining a second regression coefficient set for representing the influence degree of the independent variables on the dependent variables, wherein the second regression coefficient set comprises at least one second regression coefficient.

Step F2: and obtaining the second weight value based on the second regression coefficient set.

The process of obtaining the second weight value is the same as the process of obtaining the first weight value, and specifically, refer to the process of obtaining the first weight value, which is not described herein.

In an optional embodiment, in an actual application, after marketing is performed on a user to be tested, an actual result representing a probability of success of marketing for the user to be tested may be obtained, and based on the actual result, the first weight value and the second weight value may be adjusted, where the specific process includes steps G1 to G3.

Step G1: and obtaining an actual result representing the probability of marketing success aiming at the user to be tested.

Step G2: if the actual result representation is zero for the successful marketing times of the user, determining that the data matrix to be measured is the first sample data matrix according to the marketing times of the user being smaller than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be measured and the actual result.

Step G3: and if the actual result representation is not zero for the successful marketing times of the user, determining that the data matrix to be measured is the second sample data matrix according to the marketing times of the user being smaller than or equal to a first preset value, and adjusting the second weight value based on the data matrix to be measured and the actual result.

Taking the adjustment of the first weight value as an example, a specific adjustment manner is described, and the adjustment of the second weight value is similar and will not be described herein.

After the actual result representing the marketing success probability of the user to be measured is obtained, the actual number of successful marketing times of the user to be measured and the actual number of marketing times of the user to be measured can be obtained.

If the actual number of successful marketing aiming at the user to be measured is 0 and the actual number of marketing aiming at the user to be measured is smaller than or equal to a first preset value, the data matrix to be measured is used as a first sample data matrix, so that the first dictionary matrix is additionally provided with a first sample data matrix for learning.

Based on the constructed first sparse matrix model, due to the increase of the first sample data matrix, the first sparse representation coefficient output by the first sparse matrix model after the dictionary learning algorithm can better approximate to the data matrix to be tested of the next user to be tested, meanwhile, the mean value and the standard deviation of the same data type are correspondingly changed in value, an updated first regression coefficient set can be obtained based on the changed mean value and standard deviation, and an updated first weight value can be obtained based on the updated first regression coefficient set.

The method is described in detail in the embodiments disclosed in the present application, and the method can be implemented by using various types of devices, so that the present application also discloses a device, and specific embodiments are given below for details.

As shown in fig. 3, a block diagram of an implementation manner of a data processing apparatus according to an embodiment of the present application includes:

the first obtaining module 31 is configured to obtain a data matrix to be tested corresponding to a user to be tested, where the data matrix to be tested includes: at least one of deposit amount of the user to be tested, the number of credit cards held by the user to be tested, first data representing repayment capability of the user to be tested for the credit cards, second data representing running water of the user to be tested, total loan number of the user to be tested, third data representing loan type of the user to be tested, marketing times for the user to be tested and successful marketing times for the user to be tested;

a first input module 32, configured to input the data matrix to be tested to a first sparse matrix model, and obtain a first prediction result output by the first sparse matrix model; the first sparse matrix model is a first dictionary matrix formed by a plurality of first sample data matrixes, a first sparse representation coefficient corresponding to the data matrix to be detected is obtained through a dictionary learning algorithm, and the first prediction result is obtained through calculation based on the first dictionary matrix and the first sparse representation coefficient; the number of successful marketing times for the user contained in one first sample data matrix is zero, and the number of marketing times for the user contained in one first sample data matrix is smaller than or equal to a first preset value;

A second input module 33, configured to input the data matrix to be tested to a second sparse matrix model, so as to obtain a second prediction result output by the second sparse matrix model; the second sparse matrix model is a second dictionary matrix formed by a plurality of second sample data matrixes, a second sparse representation coefficient corresponding to the data matrix to be tested is obtained through a dictionary learning algorithm, and the second prediction result is obtained through calculation based on the second dictionary matrix and the second sparse representation coefficient; the number of times that the second sample data matrix contains marketing success aiming at the user is greater than zero, and the number of times that the second sample data matrix contains marketing aiming at the user is less than or equal to the first preset value;

the second obtaining module 34 is configured to obtain a final result based on the first weight value corresponding to the first sparse matrix model, the second weight value corresponding to the second sparse matrix model, the first prediction result, and the second prediction result.

Optionally, the method further comprises:

and the request module is used for requesting to establish communication connection with the user to be tested if the final result is greater than or equal to a second preset value.

Optionally, the method further comprises:

the third acquisition module is used for taking the plurality of first sample data matrixes as independent variables, taking actual marketing success probabilities respectively corresponding to the plurality of first sample data matrixes as dependent variables, and obtaining a first regression coefficient set for representing the influence degree of the independent variables on the dependent variables, wherein the first regression coefficient set comprises at least one first regression coefficient;

a fourth obtaining module, configured to obtain the first weight value based on the first regression coefficient set;

optionally, the method further comprises:

a fifth obtaining module, configured to obtain a second regression coefficient set that characterizes the influence degree of the independent variable on the dependent variable by using the plurality of second sample data matrices as independent variables and using actual marketing success probabilities corresponding to the plurality of second sample data matrices respectively as the dependent variable, where the second regression coefficient set includes at least one second regression coefficient;

and a sixth obtaining module, configured to obtain the second weight value based on the second regression coefficient set.

Optionally, the method further comprises:

a seventh obtaining module, configured to obtain an actual result that characterizes a probability of success of marketing for the user to be tested;

the first adjustment module is used for determining that the data matrix to be measured is the first sample data matrix according to the fact that the number of times of marketing success aiming at the user is zero and the number of times of marketing aiming at the user is smaller than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be measured and the actual result;

And the second adjustment module is used for determining that the data matrix to be measured is the second sample data matrix according to the fact that the number of times of marketing success aiming at the user is not zero and the number of times of marketing aiming at the user is smaller than or equal to a first preset value, and adjusting the second weight value based on the data matrix to be measured and the actual result.

As shown in fig. 4, a block diagram of an implementation manner of an electronic device according to an embodiment of the present application includes:

a memory 41 for storing a program;

a processor 42 for executing the program, the program being specifically for:

The processor 42 may be a central processing unit CPU or an asic ASIC (Application Specific Integrated Circuit).

The first server may further comprise a communication interface 43 and a communication bus 44, wherein the memory 41, the processor 42 and the communication interface 43 perform communication with each other via the communication bus 44.

The embodiment of the present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps involved in the embodiment of the data processing method as described in any of the above.

The features described in the respective embodiments in the present specification may be replaced with each other or combined with each other. For device or system class embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of data processing, comprising:

Obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result;

the method for acquiring the first weight value comprises the following steps:

taking the plurality of first sample data matrixes as independent variables, taking actual marketing success probabilities respectively corresponding to the plurality of first sample data matrixes as dependent variables, and obtaining a first regression coefficient set for representing the influence degree of the independent variables on the dependent variables, wherein the first regression coefficient set comprises at least one first regression coefficient;

obtaining the first weight value based on the first regression coefficient set;

the method for acquiring the second weight value comprises the following steps:

taking the plurality of second sample data matrixes as independent variables, taking actual marketing success probabilities respectively corresponding to the plurality of second sample data matrixes as dependent variables, and obtaining a second regression coefficient set for representing the influence degree of the independent variables on the dependent variables, wherein the second regression coefficient set comprises at least one second regression coefficient;

and obtaining the second weight value based on the second regression coefficient set.

2. The data processing method according to claim 1, further comprising:

And if the final result is greater than or equal to a second preset value, requesting to establish communication connection with the user to be tested.

3. The data processing method according to claim 1, further comprising:

acquiring an actual result representing the probability of marketing success for the user to be tested;

if the actual result representation is zero for the successful marketing times of the user, determining that the data matrix to be measured is the first sample data matrix according to the marketing times of the user being smaller than or equal to a first preset value, and adjusting the first weight value based on the data matrix to be measured and the actual result;

and if the actual result representation is not zero for the successful marketing times of the user, determining that the data matrix to be measured is the second sample data matrix according to the marketing times of the user being smaller than or equal to a first preset value, and adjusting the second weight value based on the data matrix to be measured and the actual result.

4. A data processing apparatus comprising:

The second obtaining module is used for obtaining a final result based on a first weight value corresponding to the first sparse matrix model, a second weight value corresponding to the second sparse matrix model, the first prediction result and the second prediction result;

5. The data processing apparatus of claim 4, further comprising:

6. An electronic device, comprising:

a memory for storing a program;

the method for acquiring the first weight value comprises the following steps:

Obtaining the first weight value based on the first regression coefficient set;

the method for acquiring the second weight value comprises the following steps:

7. A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method according to any of claims 1 to 3.