CN110598120A

CN110598120A - Behavior data based financing recommendation method, device and equipment

Info

Publication number: CN110598120A
Application number: CN201910983508.0A
Authority: CN
Inventors: 魏爽; 林路; 郏维强
Original assignee: SUNYARD SYSTEM ENGINEERING Co Ltd
Current assignee: SUNYARD SYSTEM ENGINEERING Co Ltd
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2019-12-20

Abstract

The embodiment of the invention discloses a financial recommendation method, a financial recommendation device and equipment based on behavior data, wherein the method comprises the following steps: acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information; preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing; inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge; and recommending financial products to the target user according to the recommendation knowledge. By adopting the method and the device, the historical browsing behavior sequence information of the user is captured by using the reinforcement learning model, the result of financial recommendation can be more accurate, and the click rate and the purchase rate of the user are greatly improved.

Description

Behavior data based financing recommendation method, device and equipment

Technical Field

The invention relates to the technical field of financial intelligent recommendation, in particular to a financial recommendation method, device and equipment based on behavior data.

Background

Along with the more deep popularization of finance, the intelligent financing recommendation market is mature day by day, financing users are huge in quantity, and the behavior characteristics and the preference of financing products are rich and diverse. Therefore, the recommendation system is required to make a targeted product sequencing recommendation strategy for users with different characteristics, and the purchase rate of financial products is promoted accordingly. Most of the current recommendation systems design recommendation ranking strategies for financial products based on static indexes such as fixed rules, learning of commodity dimensions, or similarity between users and financial products, but it does not consider that the users purchase financial products as a continuous process. The different stages of this continuous process are not isolated but closely related. Therefore, the current recommendation strategy has the following disadvantages:

1. the rate of purchase of the financial product recommended by the end result in practice is far from satisfactory

2. The user portrait cannot be portrayed by using the dynamic information of the historical browsing behavior of the user.

3. The user preference changes along with the time, the traditional recommendation system can only obtain the maximum current benefit, and the long-term benefit can not be obtained by tracking and modeling the dynamic changes of the user interest and behaviors.

Disclosure of Invention

The embodiment of the invention provides a financial recommendation method, device and equipment based on behavior data, which can enable the result of financial recommendation to be more accurate and greatly improve the click rate and purchase rate of a user by capturing historical browsing behavior sequence information of the user by using a reinforcement learning model.

The first aspect of the embodiments of the present invention provides a financial recommendation method based on behavior data, which may include:

acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;

preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing;

inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;

and recommending financial products to the target user according to the recommendation knowledge.

A second aspect of an embodiment of the present invention provides a financial recommendation apparatus based on behavior data, which may include:

the data acquisition unit is used for acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;

the data preprocessing unit is used for preprocessing the multi-dimensional attribute information and the historical behavior data, and the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing;

the model training unit is used for inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;

and the product recommending unit is used for recommending financial products to the target user according to the recommending knowledge.

A third aspect of embodiments of the present invention provides a computer device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction, at least one program, code set, or instruction set is loaded and executed by the processor to implement the behavior data based financial recommendation method of the above aspect.

A fourth aspect of the embodiments of the present invention provides a computer storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the behavior data based financial recommendation method according to the above aspect.

In the embodiment of the invention, the behavior sequence information of the user is considered, and a reinforcement learning model is adopted, so that the recommendation system digs the relationship between the historical browsing information of the user and the information of the financial product, the accurate personalized recommendation is realized, the accuracy and the conversion rate of recommending the financial product are improved, and the recommendation system can capture and track the dynamic changes of the interest and the behavior of the modeling user, thereby improving the dynamics of the recommendation and obtaining the long-term benefit.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a financial recommendation method based on behavior data according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a reinforcement learning model network construction provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a financial recommendation device based on behavior data according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a reinforcement learning model network building apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a module definition unit according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a function design unit according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "including" and "having," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, and the terms "first" and "second" are used for distinguishing designations only and do not denote any order or magnitude of a number. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

It should be noted that the financial recommendation method based on behavior data provided by the application can be applied to an application scene of intelligently recommending financial products for new users.

In the embodiment of the present invention, the financial recommendation method based on behavior data may be applied to a Computer device, where the Computer device may be a terminal such as a smart phone, a tablet Computer, a PC (Personal Computer), or other electronic devices with computing processing capability.

As shown in fig. 1, the financial recommendation method based on behavior data may at least include the following steps:

and S101, acquiring multi-dimensional attribute information and historical behavior data.

It can be understood that most of the attribute information may include multi-dimensional attribute information of the financial product and multi-dimensional attribute information of a user corresponding to the multi-dimensional attribute information, wherein most of the users may include attribute information of gender, age, city, and the like, and the multi-dimensional attribute information of the financial product may include information of category, label, selling point, and the like; the historical behavior data can be historical behavior data of clicking and purchasing of the financial products by the user, and can comprise time series of clicking and purchasing of the user on each financial product in history.

In an optional embodiment, the device may perform normalization on the multi-dimensional attribute information to obtain quantized data conforming to a preset format, and preferably, may perform boolean normalization.

S102, preprocessing the multi-dimensional attribute information and the historical behavior data.

In a specific implementation, the device may perform preprocessing on the multidimensional attribute information and the historical behavior data, specifically including one or more of screening, cleaning, missing value processing, and singular value processing.

For example, the null data may be filled and interpolated to smooth the data to keep the data consistent. The singular value data processing mode is as follows: if the data is an abnormal high point or an abnormal low point, the data can be rejected.

And S103, inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge.

It can be understood that the device needs to construct the reinforcement learning model network first, and the specific construction process is as follows:

firstly, the device can define a state module, an action module and a reward module in the reinforcement learning model, then carry out algorithm optimization design on a strategy function, a strategy gradient and a value function module in the reinforcement learning model, and then construct a reinforcement learning model network according to the designed algorithm.

Furthermore, the equipment can input the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training, and finally the recommendation and financing system acquires the recommendation knowledge.

And S104, recommending financial products to the target user according to the recommendation knowledge.

It will be appreciated that the new user sample incoming system will automatically give the financial products that the user is most appropriate to click on and purchase, and recommend to the target customer, the target user to whom the new user sample corresponds.

Optionally, the device may recommend a financial product to the target user by means of a short message and/or a telephone, and obtain feedback information of the user.

In a specific implementation manner of the embodiment of the present invention, a process of constructing a reinforcement learning model network by a device may be as shown in fig. 2, and includes the following steps:

s201, defining a state module in the reinforcement learning model.

In specific implementation, the device can extract state features based on historical behavior data, take multi-dimensional attribute information of the financial product corresponding to the historical behavior data in a preset time period as the state of the current model, and construct and define a state module in the reinforcement learning model based on the state features and the state.

In the embodiment of the application, the user is regarded as the environment responding to the action of the recommendation system, and the recommendation system needs to sense the state of the environment to make a decision. Based on the assumption that the user tends to click on products in the financial product sequence which are interesting to the user and less clicks on products which are not interesting to the user, the historical click behavior of the user is taken as a data source for extracting the state features. Before each recommendation, the financial product characteristics (including interest rate, conversion rate, sales volume and the like) clicked by the user in the last period of time are taken as the state of the current recommendation system, in addition, in order to distinguish users of different groups, the long-term characteristics of the user are added into the state, and the final state s is defined as:

s＝(rate₁,cvr₁,sale₁,…,rate_n,cvr_n,sale_n,power,item)

wherein n represents the number of historical clicked financial products and is a variable parameter, rate_i,cvr_i,sale_iPower, item respectively represent interest rate, conversion rate and sales of financial product i and purchasing power of userA label for the preferred product. In specific implementation, due to different dimensions of state features, the feature values of all dimensions are normalized to [0,1 ]]The interval is then processed.

S202, defining action modules in the reinforcement learning model.

Specifically, the device may construct a ranking vector to define an action module in the reinforcement learning model. For example, the rank vector μ ═ μ (μ)₁,μ₂,…,μ_m) The rank order is determined by the inner product of its feature scores and the rank weight vector μ.

And S203, defining a reward module in the reinforcement learning model.

Specifically, the device can sort financial products by combining multidimensional attribute information and a system sorting strategy, introduce prior knowledge into a reward function in the reinforcement learning model, and define a reward module in the reinforcement learning model based on the reward function into which the prior knowledge is introduced.

In the embodiment of the application, according to the sorting result of the financial products given by the recommending system, the actions of clicking, purchasing and the like of the user can be regarded as direct feedback of the sorting strategy of the recommending system. The reward rules are defined as follows:

(1) the reward value is the number of users clicking on a product if only click action of the product occurs in the recommendation sequence.

(2) The reward value is the amount of money the product is purchased for if a purchase of a financial product occurs in the recommendation sequence.

(3) In other cases, the reward value is 0.

In order to improve the discrimination of different sorting strategies on the feedback signals, some a priori knowledge can be introduced into the original reward function to accelerate the convergence of the reinforcement learning model, and the reward value of "selecting action a on state s and transferring to state s'" is defined as:

R(s,a,s')＝R₀(s,a,s')+Φ(s)

wherein R is₀(s, a, s') is an originally defined reward function, phi(s) is a function containing a priori knowledge, and the recommended financing corresponding to each stateThe product list information is incorporated into the definition of rewards, which is defined as:

wherein K is the number of products in the recommended financing product list corresponding to the state s, i represents the ith product, and mu_θ(s) is the action that the recommendation system performs in state s, ML (i | μ_θ(s)) represents a sorting strategy mu_θ(s) is a maximum likelihood estimation of clicking or bargaining on the financial product in time, and the feature vector (namely, the features of interest rate, sales volume, popularity score, real-time grade and the like) of the financial product i isThenAnd (4) sorting scores of the financial products i in the state s. Let y_iE {0,1} is a label of the financing product i which is actually clicked or traded, and the actual click-to-trade probability p of the financing product i is assumed_iAnd its ranking scoreSatisfy the requirement of

The likelihood probability of the financial product i is:

taking logarithm of the product, and integrating the log likelihood probabilities of all financial products:

the click and deal effects are taken into consideration, and for the financial product recommendation list with only click, the following are correspondingly carried out:

wherein the content of the first and second substances,is a label of whether the financial product i is clicked or not. For the sample with the deal occurrence, adding the commodity price factor into the sample to obtain

Wherein the content of the first and second substances,and Pr ice_iRespectively, a label of whether the financial product i is purchased or not and its price.

And S204, performing algorithm optimization design on the strategy function, the strategy gradient and the value function module in the reinforcement learning model.

In specific implementation, the device may express the policy by using a parameterized function, and complete the learning of the policy function by optimizing the parameter. Preferably, the device may adopt a strategy approximation method, that is, a parameterized function is used to express the strategy, and the learning of the strategy is completed by optimizing the parameters. And (4) carrying out real-time regulation and optimization of sequencing by using a deterministic strategy gradient algorithm. Taking the state characteristics as input, taking the finally effective sorting weight as output, and outputting the action for any state s

Wherein θ ═ θ₁,θ₂,…,θ_m) Is a vector of parameters for the motion that,the sorting weight of the ith dimension is divided into

Where φ(s) is the feature vector of state s, C_iAnd (5) constant for sorting the weight distribution in the ith dimension.

Further, the device may obtain an objective function over all states based on the determined policy, and update the objective function according to a gradient policy optimization, wherein the objective function is a sum of long-term accumulated reward expectations. It should be noted that the goal of the reinforcement learning model is to maximize the long-term cumulative reward, i.e., in a deterministic strategy μ_θBy the action of (1), recommending the sum of the long-term accumulated reward expectations that the system can obtain in all states:

by finding the objective function J (mu)_θ) With respect to the gradient of the parameter theta such that J (mu)_θ) And maximizing the value, and updating the theta in the gradient direction. According to the strategic gradient theorem, the gradient is

Wherein Q is^μ(s, a) is the strategy μ_θThe lower state action is for a corresponding long-term jackpot prize for (s, a). Thus, the parameter θ is updated by the formula

Wherein alpha is_θIn order to obtain a learning rate,is a Jacobian matrix, Q^μ(s, a) approximate calculation is carried out by using a value function estimation method, and a linear function estimation method is adopted to express a Q function by using a parameter vector w:

Q^μ(s,a)≈Q^w(s,a)＝φ(s,a)^Tw

where φ (s, a) is the feature vector of the state-action pair (s, a), the order may be selectedThen can obtain

Therefore, the update formula of the parameter vector of the policy function is:

further, the device may introduce a merit function, based on which a value function in the reinforcement learning model is designed. It will be appreciated that the value function Q^wThe parameter vector w of (a) also needs to be updated, which can be referred to as the Q-learning algorithm, for the sample(s)_t,a_t,r_t,s_t+1) Comprises the following steps:

wherein s is_t,a_t,r_t,s_t+1Respectively the state perceived by the recommender system at time t, the action taken, the reward feedback derived therefrom and the state perceived at time t +1, δ_t+1Referred to as differential error, alpha_wIs the learning rate of w. Introducing an advantage function, expressing the Q function as the sum of a state value function V(s) and an advantage function A (s, a), estimating the value of the state s from a global perspective with V(s), estimating the advantage of the action a in the state s relative to other actions from a local perspective with A (s, a):

wherein w and V are the parameter vectors of A and V, respectively. Finally, the updating mode of all parameters is as follows:

v_t+1＝v_t+α_vδ_t+1φ(s_t)

s205, constructing a reinforcement learning model network according to a designed algorithm.

In the embodiment, the accuracy of personalized recommendation is further improved by constructing the reinforcement learning model network.

The following describes in detail a financial recommendation device and a reinforcement learning model network construction device based on behavior data according to an embodiment of the present invention with reference to fig. 3 to 6. It should be noted that, the financial recommendation device shown in fig. 3-6 is used for executing the method of the embodiment shown in fig. 1 and 2 of the present invention, for convenience of description, only the part related to the embodiment of the present invention is shown, and details of the specific technology are not disclosed, please refer to the embodiment shown in fig. 1 and 2 of the present invention.

Referring to fig. 3, a schematic structural diagram of a financial recommendation apparatus based on behavior data is provided in an embodiment of the present invention. As shown in fig. 3, the financial recommendation apparatus 10 according to an embodiment of the present invention may include: a data acquisition unit 101, a data preprocessing unit 102, a model training unit 103, a product recommendation unit 104, and a data normalization unit 105. As shown in fig. 4, the network construction apparatus 20 may include a module definition unit 201, a function design unit 202, and a model construction unit 203. As shown in fig. 5, the module defining unit 201 includes a feature extracting sub-unit 2011, a state determining sub-unit 2012, a state defining sub-unit 2013, an action defining sub-unit 2014, a product sorting sub-unit 2015, a knowledge introducing sub-unit 2016 and a reward defining sub-unit 2017, and as shown in fig. 6, the function designing unit 202 includes a policy function designing sub-unit 2021, a policy gradient designing sub-unit 2022 and a value function designing sub-unit 2023.

The data acquisition unit 101 is configured to acquire multi-dimensional attribute information and historical behavior data, where the multi-dimensional attribute information includes multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information.

Optionally, the data normalizing unit 105 is configured to perform normalization processing on the multi-dimensional attribute information to obtain quantized data conforming to a preset format.

And the data preprocessing unit 102 is used for preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing.

And the model training unit 103 is configured to input the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge.

And the product recommending unit 104 is used for recommending financial products to the target user according to the recommending knowledge.

In another embodiment:

the module definition unit 201 is used for defining a state module, an action module and a reward module in the reinforcement learning model.

In an alternative embodiment, the module definition unit 201 includes:

a feature extraction subunit 2011, configured to extract the status feature based on the historical behavior data.

The state determining subunit 2012 is configured to use the multi-dimensional attribute information of the financial product corresponding to the historical behavior data in the preset time period as the state of the current model.

And the state definition subunit 2013 is used for constructing and defining a state module in the reinforcement learning model based on the state features and the states.

And the action definition subunit 2014 is used for constructing a sequencing vector, so that the sequencing vector defines an action module in the reinforcement learning model.

And the product sorting sub-unit 2015 is used for sorting the financing products by combining the multi-dimensional attribute information and the system sorting strategy.

A knowledge introduction subunit 2016 configured to introduce a priori knowledge for the reward function in the reinforcement learning model.

A reward definition subunit 2017, configured to define a reward module in the reinforcement learning model based on a reward function introducing prior knowledge.

And the function design unit 202 is configured to perform algorithm optimization design on the policy function, the policy gradient and the value function module in the reinforcement learning model.

In an alternative embodiment, the function design unit 202 includes:

the strategy function design subunit 2021 is configured to express a strategy by using a parameterized function, and complete the learning of the strategy function by optimizing the parameter.

A strategy gradient design subunit 2022, configured to obtain an objective function in all states based on the determined strategy, and optimally update the objective function according to the gradient strategy, where the objective function is the sum of long-term accumulated reward expectations.

A value function designing subunit 2023, configured to introduce a merit function, and design a value function in the reinforcement learning model based on the merit function.

And the model construction unit 203 is used for constructing the reinforcement learning model network according to the designed algorithm.

It should be noted that, for the detailed execution process of each unit and sub-unit in this embodiment, reference may be made to the description in the foregoing method embodiment, and details are not described here again.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments shown in fig. 1 and fig. 2, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 1 and fig. 2, which are not described herein again.

The embodiment of the application also provides computer equipment. As shown in fig. 7, the computer device 30 may include: the at least one processor 301, e.g., CPU, the at least one network interface 304, the user interface 303, the memory 305, the at least one communication bus 302, and optionally, a display screen 306. Wherein a communication bus 302 is used to enable the connection communication between these components. The user interface 303 may include a touch screen, a keyboard or a mouse, among others. The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and a communication connection may be established with the server via the network interface 304. The memory 305 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 305 includes a flash in the embodiment of the present invention. The memory 305 may alternatively be at least one memory system located remotely from the processor 301. As shown in fig. 7, memory 305, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.

It should be noted that the network interface 304 may be connected to a receiver, a transmitter or other communication module, and the other communication module may include, but is not limited to, a WiFi module, a bluetooth module, etc., and it is understood that the computer device in the embodiment of the present invention may also include a receiver, a transmitter, other communication module, etc.

Processor 301 may be configured to call program instructions stored in memory 305 and cause computer device 30 to:

In some embodiments, the apparatus 30 is further configured to:

defining a state module, an action module and a reward module in the reinforcement learning model;

performing algorithm optimization design on a strategy function, a strategy gradient and a value function module in the reinforcement learning model;

and constructing a reinforcement learning model network according to a designed algorithm.

In some embodiments, the apparatus 30 is further configured to:

and carrying out standardization processing on the multi-dimensional attribute information to obtain quantitative data conforming to a preset format.

In some embodiments, the normalization process is a boolean normalization process.

In some embodiments, the apparatus 30, when defining the state module in the reinforcement learning model, is specifically configured to:

extracting state features based on historical behavior data;

taking multi-dimensional attribute information of the financial product corresponding to historical behavior data in a preset time period as the state of the current model;

and constructing and defining a state module in the reinforcement learning model based on the state characteristics and the state.

In some embodiments, the apparatus 30, when defining the action module in the reinforcement learning model, is specifically configured to:

and constructing a sequencing vector, and defining an action module in the reinforcement learning model by using the sequencing vector.

In some embodiments, the device 30, when defining the reward module in the reinforcement learning model, is specifically configured to:

sorting the financial products by combining the multi-dimensional attribute information and a system sorting strategy;

introducing prior knowledge into a reward function in the reinforcement learning model;

a reward module in the reinforcement learning model is defined based on a reward function that introduces a priori knowledge.

In some embodiments, the apparatus 30 is specifically configured to, when performing algorithm optimization design on the policy function, the policy gradient, and the value function module in the reinforcement learning model:

expressing the strategy by adopting a parameterized function, and finishing the learning of the strategy function by optimizing the parameter;

obtaining an objective function on all states based on the determined strategy, and optimizing and updating the objective function according to a gradient strategy, wherein the objective function is the sum of long-term accumulated reward expectation;

and introducing a merit function, and designing a value function in the reinforcement learning model based on the merit function.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A financial recommendation method based on behavior data is characterized by comprising the following steps:

preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, cleaning, missing value processing and singular value processing;

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

and carrying out standardization processing on the multidimensional attribute information to obtain quantitative data which accords with a preset format.

4. The method of claim 3, wherein the normalization process is a Boolean normalization process.

5. The method of claim 2, wherein defining a state module in the reinforcement learning model comprises:

extracting state features based on the historical behavior data;

taking the multi-dimensional attribute information of the financial product corresponding to the historical behavior data in a preset time period as the state of the current model;

and constructing a state module in the defined reinforcement learning model based on the state characteristics and the state.

6. The method of claim 2, wherein defining an action module in a reinforcement learning model comprises:

7. The method of claim 2, wherein defining a reward module in a reinforcement learning model comprises:

8. The method of claim 2, wherein the performing an algorithm optimization design on the strategy function, the strategy gradient and the value function module in the reinforcement learning model comprises:

9. A financial recommendation device based on behavioral data, comprising:

the data preprocessing unit is used for preprocessing the multi-dimensional attribute information and the historical behavior data, and the preprocessing comprises one or more of screening, clearness, missing value processing and singular value processing;

10. A computer device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method for automatic generation of text excerpts based on deep learning according to any one of claims 1 to 8.