CN114862506B

CN114862506B - Financial product recommendation method based on deep reinforcement learning

Info

Publication number: CN114862506B
Application number: CN202210434003.0A
Authority: CN
Inventors: 王瑜; 石宏飞; 谢晨; 李海英; 梁钥; 刘敏慧; 王文琳
Original assignee: Shenwan Hongyuan Securities Co ltd
Current assignee: Shenwan Hongyuan Securities Co ltd
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2024-06-14
Anticipated expiration: 2042-04-24
Also published as: CN114862506A

Abstract

The invention discloses a financial product recommendation method based on deep reinforcement learning, which comprises the following steps: step 1, establishing a client interest preference model to obtain click preference scores, risk preference scores and asset preference scores; step 2, historical data are mined to form ideal product investment distribution, and an asset balance model is established according to the customer group of the current customer and the holding bin of the customer group to obtain the differential score of the current product distribution and the ideal product distribution; step 3, exploring and modeling potential interests of the clients; step 4, performing fusion parameter self-adaptive learning on the score factors obtained in the previous step by adopting a deep reinforcement learning method; after the implementation of the scheme, the average click rate, the purchase conversion rate and the transaction amount of the user are greatly improved, and personalized recommendation service is carried out on the APP financial mall page at the mobile phone end; through continuous deep understanding of clients, the satisfaction degree of the client service is improved, and powerful technical support is provided for the financial management transformation of the company.

Description

Financial product recommendation method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of artificial intelligence application, and particularly relates to a financial product recommendation method based on deep reinforcement learning.

Background

In recent years, the financial management market of China is developed at a high speed; by the end of 2020, personal financial assets in China reach 205 trillion, the Internet financial management market reaches 8.2 trillion, and meanwhile, the ages of main financial management client groups are continuously younger, wherein the main Internet financial management client groups are 21-35 years old, the accurate marketing business becomes a very important growth point of securities companies, for example, the net income of the securities industry agency sales financial products is increased year by year, the annual income reaches 134.38 billions in 2020, the same ratio is increased by 148.76%, and the products mainly marketing by the financial management business comprise various categories of resource management products, public offering funds, private fund, marketing trust and the like; the accurate marketing business can bring three capacities to securities companies, the first is the flow capacity, the securities companies are actively establishing a technological platform surrounding new trend of financial management, the service level is improved, the second is the consultant and companion capacity, the securities companies realize comprehensive conversion of business on-line, the viscosity of clients is greatly improved through providing financial services and the content of streaming media such as information video, the services of the consultants and companion are provided for the clients, the third is the product and investment capacity, the competitive power of the most core of the securities companies is also achieved, and the financial products of the securities company are formed into a new system taking public offering and private recruitment product replacement and security dealer resource management as the basis, and the large-class asset configuration, the special user of the foundation and the like are taken as the business forms;

The accurate marketing is to provide specialized, intelligent and personalized services for clients from the demands of the clients; based on rich customer portrait data of securities companies, customer preferences are effectively identified, and thousands of people and thousands of services of information, videos and live broadcast contents are provided for customers; secondly, the interactive feedback of the client and the content further deepens the understanding and the knowledge of the client; on the other hand, securities companies have a rich pool of multi-class products including public offering funds, OTC, trust, private recruitment, etc.; considering the difference of financial interests, risk preference and asset balance demands of different clients, the problem of accurate marketing is how to help the clients to realize long-term benefit maximization on the premise of comprehensively considering multidimensional factors, and whether to recommend personalized financial schemes which best meet the demands of the clients through an intelligent means, so that a financial product recommendation method based on deep reinforcement learning needs to be developed to solve the existing problems.

Disclosure of Invention

The invention aims to provide a financial product recommendation method based on deep reinforcement learning, which aims to solve the problem that personalized financial schemes which best meet the requirements of customers cannot be intelligently recommended.

In order to achieve the above purpose, the present invention provides the following technical solutions: a financial product recommendation method based on deep reinforcement learning comprises the following steps:

step1, establishing a client interest preference model to obtain click preference scores, risk preference scores and asset preference scores;

Step 2, historical data are mined to form ideal product investment distribution, and an asset balance model is established according to the customer group of the current customer and the holding bin of the customer group to obtain the differential score of the current product distribution and the ideal product distribution;

step 3, establishing a potential interest exploration model of the client, and using new product exploration modeling to realize the exploration of unknown interests of the client so as to acquire product exploration scores;

And step 4, performing fusion parameter self-adaptive learning on the score factors obtained in the step by adopting a deep reinforcement learning method, and then performing sequencing recommendation.

Preferably, the client interest preference modeling step includes:

A step of building a tree model; and the clicking and purchasing behaviors of the product and the active clients with more clicking and reading behaviors of the information learn the clicking preference, purchasing preference and risk preference of the active clients through GBDT modeling, and the clicking and purchasing probabilities of the clients are target learning.

Preferably, the client interest preference modeling step further includes: distillation learning step: learning the preference of the new client and the active client, finding the most similar active client to the current new client, expressing the preference of the new client by using the preference of the similar active client, setting Teacher Model the similarity of the training active client for the active client, calculating the similarity of all financial active clients as the calculated result, inputting the calculated result into a Student Model through distillation extraction, wherein the Student Model does not adopt the client purchasing behavior data as a characteristic at the moment, and finding the most similar old client based on the current new client through distillation extraction.

Preferably, the step of modeling the customer asset balance by using the difference value between the current product distribution and the ideal product distribution includes: obtaining a current product distribution and ideal product distribution differentiation value through a distribution differentiation formula; the distribution differentiation formula:

wherein p (c|g) represents the target distribution of a certain product type C of the customer group G where the user U is located; q (c|u) represents the binning and recommendation distribution of user U based on a certain product type C,

C _KL represents the difference value between the current product distribution and the ideal product distribution; u represents a user; g represents a customer group; c represents a product type;

summing up on behalf of users in all groups;

p represents the target distribution and q represents the current distribution.

Preferably, the differentiating value between the current product distribution and the ideal product distribution further includes obtaining an optimal subset, which includes the following steps:

from the original set z= {1, …, N } M item formations are chosen:

wherein Z represents a commodity, and N represents a product; m represents selecting an optimal subset of products from N; item represents a commodity;

The optimal subset y=argmax (det (L _Y));

wherein Y represents the optimal subset, and L _Y represents a determinant score corresponding to the optimal subset Y;

A matrix of client-relevance that is a function of the client-relevance matrix,

L is a constructed client relevance matrix, wherein q _i represents the relevance score of the ith candidate content to the client, and D _ij represents the distance between the candidate contents i and j;

Adding the candidate content into a set formula Y-U { i }; the candidate content i is added to the optimal subset set Y.

Preferably, in step 4, the types of the new product exploration model include: type exploration, racetrack exploration and new development exploration;

The method for exploring the model for the new product comprises the following steps:

Searching a good product pool with performance, searching a seed customer representation of a product through clicking and purchasing actions of the customer on the product, searching a product corresponding to the seed customer representation closest to the customer according to the current customer representation, performing interest exploration, recommending the good product preferred by a person similar to the customer as an exploration product of the customer, and exploring the potential interest of the customer;

after finding the potential interesting excellent products of the clients, if the client benefits are negative, stopping searching;

if the financial income of the client reaches the set value, the exploration degree is increased.

Preferably, the method further comprises step 5, by exponentially fusing the formula at1+bt2+; optimizing the parameters of t1 and t2, and continuously and adaptively learning with the aim of maximizing the long-term benefit to obtain a long-term investment benefit maximizing modeling; wherein a represents a customer preference modeling factor value; and b represents a customer asset balance modeling factor value, t1 and t2 are fusion parameters.

Preferably, the long-term benefit is obtained by learning a target formula Q (S, a), where S represents a scene attribute of a current customer and a current asset status, a represents a combination of discrete values of a plurality of fusion factor parameters, and R represents feedback on the short-term benefit after the customer purchases the product, or whether the customer purchases the product.

Preferably, the data of the product comprises: risk level, yield, and holding amount, and obtaining a product representation from product data analysis: wherein the number of investment varieties and income analysis in the product portrait is at least 500 labels.

Preferably, the data of the client interest preference model, the asset balance model and the client potential interest exploration model are all used under the premise of client consent, and unauthorized client data are not collected and used.

The invention has the technical effects and advantages that: the reinforcement learning has long-term light, and focuses on decision-making long-term return; the supervised learning generally considers the problem of one time, pays attention to short-term benefits, and considers instant return, and the reinforcement learning method is very matched with a reasonable investor payoff attention to long-term return of investment targets; reinforcement learning solves the decision optimization problem of the sequence action, and continuous training is performed after data is obtained from the environment so as to obtain accurate response to the environment, wherein the continuous change of the client investment combination strategy is the sequence decision problem; model-Free deep reinforcement learning does not need modeling environment, does not need a large amount of marked data, and performs self-learning by using Action trial and error;

the customer service satisfaction is improved, an intelligent recommendation system is built based on a big data platform, customer portraits, product portraits and behavior data bases are built, an artificial intelligent platform is utilized, customer interest preference modeling, customer asset balance modeling and customer potential interest exploration modeling are carried out in recall, sorting and fusion of system levels, historical purchasing behavior and income data of customers are accumulated based on the past 5 years according to modeling results, 6 months of income maximization is used as a final modeling target, reinforcement learning samples are constructed, and continuous application and exploration, feedback and reinforcement are carried out; the APP front-end display system is used for displaying the clients, the click rate of the products is improved by more than 20% compared with that of the prior manual pushing, the purchase conversion rate is improved by about 3 times, and the client conversion rate is improved by more than 30%;

After the implementation of the scheme, the average click rate, the purchase conversion rate and the transaction amount of the user are greatly improved, and besides the financial mall of the APP at the mobile phone end, the C-end service can further expand the recommended result to modules such as the PC-end financial mall, the APP home page and APP information recommendation; b, carrying out saving product recommendation on lost early warning clients and carrying out corresponding product recommendation on clients with assets abnormal; through intelligent recommendation technology application, stock investment clients are converted into financial clients, new clients of more years and lighter groups are brought to open accounts through new technologies, and meanwhile, through continuous deep understanding of the clients, the service satisfaction of the clients is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of the tree model of the present invention;

FIG. 3 is a schematic diagram of the distillation learning method according to the present invention;

FIG. 4 is a schematic diagram of a client preference model according to the present invention;

FIG. 5 is a schematic view of a structural framework of the present invention;

FIG. 6 is a schematic diagram of a new product exploration model structure of the present invention;

FIG. 7 is a schematic diagram of an intelligent recommendation system architecture according to the present invention;

FIG. 8 is a schematic diagram of the application of the smart product of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a financial product recommendation method based on deep reinforcement learning as shown in fig. 1 and 5, which comprises the following steps:

step 1, establishing a client interest preference model to obtain click preference scores, risk preference scores and asset preference scores; the customer interest preference modeling step includes: as shown in fig. 2,3 and 4;

A step of building a tree model; active clients with more clicking and purchasing behaviors of products and clicking and reading behaviors of information learn clicking preferences, purchasing preferences and risk preferences of the active clients through GBDT modeling, and learn the clicking and purchasing probabilities of the clients as targets;

Distillation learning step: learning the preference of the new client and the preference of the active client, finding the most similar active client to the current new client, expressing the preference of the new client by using the preference of the similar active client, setting Teacher Model the similarity of the training active client for the active client, calculating the similarity of all financial active clients as the calculated result, inputting the calculated result into a Student Model through distillation extraction, wherein the Student Model does not adopt the client purchasing behavior data as the characteristic at the moment, and finding the most similar old client based on the current new client through distillation extraction;

In this embodiment, different customer product preferences are different, the traditional marketing based on product dimension cannot meet the personalized requirements of customers, how to accurately identify the interests of customers becomes a key problem, the solution thinking is to distinguish old customers from new customers, firstly, for active customers with more clicking and purchasing behaviors of products and more clicking and reading behaviors of information, the data base is enough to support GBDT modeling, and learn the clicking preferences, purchasing preferences and risk preferences of the active customers; for new clients, namely non-financial clients, the behavior and purchase information are very few, a distillation learning method is needed, in TeacherModel in the middle part of the left side in fig. 6, the similarity of active clients is trained, the data are all data features of the bottommost layer, so that higher learning precision can be obtained by training, the calculated result is the similarity of all financial old clients, further, the result is input into the middle part of the Model through distillation extraction, the left side is a new client, the right side is an active client, the Model does not adopt data such as the purchase behavior of the clients as features, the distillation extraction is based on the current new client, the old client which is most similar to the current new client is found, the preference of the new client is indirectly represented by the old client, and the recommendation precision of the new client is improved;

Step 2, historical data are mined to form ideal product investment distribution, and an asset balance model is established according to the customer group of the current customer and the holding bin of the customer group to obtain the differential score of the current product distribution and the ideal product distribution; the step of modeling the customer asset balance by using the difference value between the current product distribution and the ideal product distribution comprises the following steps: obtaining a current product distribution and ideal product distribution differentiation value through a distribution differentiation formula; the distribution differentiation formula:

summing up on behalf of users in all groups;

p represents the target distribution and q represents the current distribution;

the current product distribution and ideal product distribution differentiation value further comprises the steps of obtaining an optimal subset, wherein the optimal subset comprises the following steps:

from the original set z= {1,.. N, selecting M item components:

The optimal subset y=argmax (det (L _Y));

A matrix of client-relevance that is a function of the client-relevance matrix,

Adding the candidate content into a set formula Y-U { i }; adding the candidate content i into the optimal subset set Y;

In this embodiment, the data of the product includes: risk level, yield, and holding amount, and obtaining a product representation from product data analysis: wherein, the number of investment varieties and income analysis in the product portrait is at least 500 labels;

In the embodiment, from the viewpoint of customer asset balance modeling, according to the asset portfolio balance model theory, investors distribute wealth to various optional assets according to the risk and income principles according to own investment preference to form an optimal asset portfolio; by mining based on historical data, judging what the combination of products is in excellent form in the guest group, and forming ideal product investment distribution; recommending proper products to the customers based on the customer groups and the holding bins of the customers, so that the product combination distribution after the customers purchase the products is more in line with ideal distribution; on the other hand, when the product recommendation is carried out, on the premise of meeting the interest preference of the customer, more products with large difference from the products held in the warehouse are presented to promote the asset balance;

Step 3, establishing a potential interest exploration model of the client, and utilizing new product exploration modeling to realize the mining of unknown interests of the client; types of the new product exploration model include: type exploration, racetrack exploration and new development exploration; as shown in figure 6 of the drawings,

If the financial income of the client reaches the set value, increasing the exploration degree;

In the embodiment, the client interest preference and the asset balance modeling are performed based on the existing data of the client, a required exploration function in a recommendation system explores the unknown interest of the client, so that the information cocoons are prevented, the new product exploration modeling of the financial scene and the exploration modeling thought of the Internet are consistent, and the client is heuristically displayed by using excellent financial products; the excellent performance of the financial product is used because even if the customer is not interested, the financial product is considered to be a good product and does not cause the customer to feel strong objection; firstly, finding a product pool with excellent performance, and finding a seed customer representation of a product by adopting a Look-aLike algorithm through the click purchase behavior of the customer on the product; aiming at a certain customer, finding a product corresponding to a seed customer representation closest to the customer according to the current customer representation, and performing interest exploration; that is, recommending the excellent product preferred by the person similar to him to the customer as his exploring product, exploring the potential interests of the customer; when a good product of potential interest of a customer is found, the customer is not explored at any time; when a customer faces a huge deficit, it is more prone to invest in new products in the area of his own familiarity, where exploration is not appropriate; when the recent financial gain of a client is very high, the acceptance of the client to the new product is higher, and the exploration degree can be increased at the moment; introducing a potential interest exploration dynamics model of the client to the right side of the figure 6 to help judge the time for increasing the exploration dynamics for the client; therefore, the client interest exploration is the exploration of different dynamics performed at different times and under different scenes;

Step 4, performing fusion parameter self-adaptive learning and then sequencing recommendation on the score factors obtained in the previous step by using a deep reinforcement learning method, and performing exponential fusion on the score factors according to a formula a < t1+ b < t2+ >; optimizing the parameters of t1 and t2, and continuously and adaptively learning with the aim of maximizing the long-term benefit to obtain a long-term investment benefit maximizing modeling; wherein a represents a customer preference modeling factor value; and b represents a customer asset balance modeling factor value, t1 and t2 are fusion parameters; the interest preference, the asset balance and the interest exploration modeling of the clients are completed, and the multidimensional scores of the clients for different products can be obtained; in the embodiment, the multidimensional scores are accurately fused; the long-term benefits are obtained through learning a target formula Q (S, A), wherein S represents scene attributes and asset status of current clients, A represents combination of discrete values of a plurality of fusion factor parameters, and R represents feedback of the short-term benefits after the clients purchase products or whether the clients purchase the products;

FIG. 7 is a recommendation application framework, wherein the bottom layer is a traditional data warehouse and a big data platform, the upper layer is used for interfacing with an artificial intelligent platform to provide distributed computing power and algorithms, and the top layer is an intelligent recommendation system; the frame of the recommendation system is basically consistent with the recommendation system frame of the Internet, and is divided into three layers of recall, sequencing and fusion, and each layer supports the configuration of service operation; the most basic API service interface is arranged at the upper layer, and a traditional data running batch pushing mode is also provided; the terminal system of the uppermost butt joint comprises various terminals such as APP, PC, weChat and the like; the facing objects also include investment consultants of the business section, and business operators of the headquarters; this is also a different point from internet recommendation systems, which output recommended results to marketers because many business sites will provide assistance in product marketing; as shown in fig. 8, a mode of combining on-line and off-line, and combining manual operation and intelligent algorithm is adopted; the method mainly uses an Internet scene as a main part, mainly relies on an intelligent algorithm, and uses an off-line scene as an auxiliary part; marketing personnel can configure and market products recommended by clients through a marketing platform, and simultaneously, a recommended product list seen by the clients is generated by combining intelligent recommended products and operation products, so that comprehensive analysis is performed to form a marketing analysis report. After seeing the data and analyzing, marketers can deepen understanding of clients, and then contact the clients through off-line telephone or WeChat popularization;

The data of the client interest preference model, the asset balance model and the client potential interest exploration model are all used on the premise of client consent, and unauthorized client data are not collected and used.

Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims

1. A financial product recommendation method based on deep reinforcement learning is characterized in that: the method comprises the following steps:

step3, establishing a potential interest exploration model of the client, and using the new product exploration model to realize the mining of unknown interests of the client;

step 4, performing fusion parameter self-adaptive learning on the score factors obtained in the previous step by using a deep reinforcement learning method, and then ranking and recommending; the step of modeling the customer asset balance by using the difference value between the current product distribution and the ideal product distribution comprises the following steps: obtaining a current product distribution and ideal product distribution differentiation value through a distribution differentiation formula; the distribution differentiation formula:

Wherein/> Representing a target distribution of a certain product type C in a customer group G in which the user U is located; /(I)Representing the binning and recommendation distribution of the user U based on a certain product type C;

Representing the difference value between the current product distribution and the ideal product distribution; u represents a user; g represents a customer group; c represents a product type;

Representing the sum of the users in all groups.

2. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: the client interest preference model modeling step comprises the following steps:

building a tree model: click preferences, purchase preferences, risk preferences of active customers are learned by GBDT modeling.

3. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: the client interest preference model modeling step further comprises: distillation learning step: learning the preference of the new client and the active client, finding the most similar active client to the current new client, expressing the preference of the new client by using the preference of the similar active client, setting Teacher Model the similarity of the training active client for the active client, calculating the similarity of all financial active clients as the calculated result, inputting the calculated result into a Student Model through distillation extraction, wherein the Student Model does not adopt the client purchasing behavior data as a characteristic at the moment, and finding the most similar old client based on the current new client through distillation extraction.

4. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: the current product distribution and ideal product distribution differentiation value further comprises: the optimal subset is obtained, and the steps are as follows:

Selecting M products from an original product set Z= {1, …, N } to form an optimal subset Y;

Optimal subset ；

Wherein Y represents the optimal subset,Representing determinant scores corresponding to the optimal subset Y;

A matrix of client-relevance that is a function of the client-relevance matrix, L is a constructed customer relevance matrix, wherein/>Representing the relevance score of the ith candidate content to the client,/>Representing the distance between candidate contents i and j;

candidate content joining set formula ; Adding candidate content i to the optimal subset set/>Is a kind of medium.

5. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: in step 3, the types of the new product exploration model include: type exploration, racetrack exploration and new development exploration;

6. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: the method further comprises step 5, by exponential fusion of the formula a t1+ b t 2+; optimizing the parameters of t1 and t2, and continuously and adaptively learning with the aim of maximizing the long-term benefit to obtain a long-term investment benefit maximizing modeling; wherein a represents a customer preference modeling factor value; and b represents a customer asset balance modeling factor value, t1 and t2 are fusion parameters.

7. The method for recommending financial products based on deep reinforcement learning of claim 6, wherein: the long term benefit is derived by learning a target formula Q (S, a), where Q represents the desire for maximum investment benefit, S represents the current customer' S scene attributes and asset status, and a represents a combination of discrete values of multiple fusion factor parameters.

8. The financial product recommendation method based on deep reinforcement learning of claim 4, wherein: the data of the product include: risk level, yield, and holding amount, and obtaining a product representation from product data analysis: wherein the number of investment varieties and income analysis in the product portrait is at least 500 labels.

9. The financial product recommendation method based on deep reinforcement learning of claim 1, wherein: the data of the client interest preference model, the asset balance model and the client potential interest exploration model are all used on the premise of client consent, and unauthorized client data are not collected and used.