CN110598120A - Behavior data based financing recommendation method, device and equipment - Google Patents

Behavior data based financing recommendation method, device and equipment Download PDF

Info

Publication number
CN110598120A
CN110598120A CN201910983508.0A CN201910983508A CN110598120A CN 110598120 A CN110598120 A CN 110598120A CN 201910983508 A CN201910983508 A CN 201910983508A CN 110598120 A CN110598120 A CN 110598120A
Authority
CN
China
Prior art keywords
attribute information
learning model
reinforcement learning
dimensional attribute
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910983508.0A
Other languages
Chinese (zh)
Inventor
魏爽
林路
郏维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUNYARD SYSTEM ENGINEERING Co Ltd
Original Assignee
SUNYARD SYSTEM ENGINEERING Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUNYARD SYSTEM ENGINEERING Co Ltd filed Critical SUNYARD SYSTEM ENGINEERING Co Ltd
Priority to CN201910983508.0A priority Critical patent/CN110598120A/en
Publication of CN110598120A publication Critical patent/CN110598120A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)

Abstract

The embodiment of the invention discloses a financial recommendation method, a financial recommendation device and equipment based on behavior data, wherein the method comprises the following steps: acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information; preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing; inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge; and recommending financial products to the target user according to the recommendation knowledge. By adopting the method and the device, the historical browsing behavior sequence information of the user is captured by using the reinforcement learning model, the result of financial recommendation can be more accurate, and the click rate and the purchase rate of the user are greatly improved.

Description

Behavior data based financing recommendation method, device and equipment
Technical Field
The invention relates to the technical field of financial intelligent recommendation, in particular to a financial recommendation method, device and equipment based on behavior data.
Background
Along with the more deep popularization of finance, the intelligent financing recommendation market is mature day by day, financing users are huge in quantity, and the behavior characteristics and the preference of financing products are rich and diverse. Therefore, the recommendation system is required to make a targeted product sequencing recommendation strategy for users with different characteristics, and the purchase rate of financial products is promoted accordingly. Most of the current recommendation systems design recommendation ranking strategies for financial products based on static indexes such as fixed rules, learning of commodity dimensions, or similarity between users and financial products, but it does not consider that the users purchase financial products as a continuous process. The different stages of this continuous process are not isolated but closely related. Therefore, the current recommendation strategy has the following disadvantages:
1. the rate of purchase of the financial product recommended by the end result in practice is far from satisfactory
2. The user portrait cannot be portrayed by using the dynamic information of the historical browsing behavior of the user.
3. The user preference changes along with the time, the traditional recommendation system can only obtain the maximum current benefit, and the long-term benefit can not be obtained by tracking and modeling the dynamic changes of the user interest and behaviors.
Disclosure of Invention
The embodiment of the invention provides a financial recommendation method, device and equipment based on behavior data, which can enable the result of financial recommendation to be more accurate and greatly improve the click rate and purchase rate of a user by capturing historical browsing behavior sequence information of the user by using a reinforcement learning model.
The first aspect of the embodiments of the present invention provides a financial recommendation method based on behavior data, which may include:
acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;
preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing;
inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;
and recommending financial products to the target user according to the recommendation knowledge.
A second aspect of an embodiment of the present invention provides a financial recommendation apparatus based on behavior data, which may include:
the data acquisition unit is used for acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;
the data preprocessing unit is used for preprocessing the multi-dimensional attribute information and the historical behavior data, and the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing;
the model training unit is used for inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;
and the product recommending unit is used for recommending financial products to the target user according to the recommending knowledge.
A third aspect of embodiments of the present invention provides a computer device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, code set, or instruction set, and the at least one instruction, at least one program, code set, or instruction set is loaded and executed by the processor to implement the behavior data based financial recommendation method of the above aspect.
A fourth aspect of the embodiments of the present invention provides a computer storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the behavior data based financial recommendation method according to the above aspect.
In the embodiment of the invention, the behavior sequence information of the user is considered, and a reinforcement learning model is adopted, so that the recommendation system digs the relationship between the historical browsing information of the user and the information of the financial product, the accurate personalized recommendation is realized, the accuracy and the conversion rate of recommending the financial product are improved, and the recommendation system can capture and track the dynamic changes of the interest and the behavior of the modeling user, thereby improving the dynamics of the recommendation and obtaining the long-term benefit.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a financial recommendation method based on behavior data according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a reinforcement learning model network construction provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a financial recommendation device based on behavior data according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a reinforcement learning model network building apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a module definition unit according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a function design unit according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "including" and "having," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, and the terms "first" and "second" are used for distinguishing designations only and do not denote any order or magnitude of a number. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
It should be noted that the financial recommendation method based on behavior data provided by the application can be applied to an application scene of intelligently recommending financial products for new users.
In the embodiment of the present invention, the financial recommendation method based on behavior data may be applied to a Computer device, where the Computer device may be a terminal such as a smart phone, a tablet Computer, a PC (Personal Computer), or other electronic devices with computing processing capability.
As shown in fig. 1, the financial recommendation method based on behavior data may at least include the following steps:
and S101, acquiring multi-dimensional attribute information and historical behavior data.
It can be understood that most of the attribute information may include multi-dimensional attribute information of the financial product and multi-dimensional attribute information of a user corresponding to the multi-dimensional attribute information, wherein most of the users may include attribute information of gender, age, city, and the like, and the multi-dimensional attribute information of the financial product may include information of category, label, selling point, and the like; the historical behavior data can be historical behavior data of clicking and purchasing of the financial products by the user, and can comprise time series of clicking and purchasing of the user on each financial product in history.
In an optional embodiment, the device may perform normalization on the multi-dimensional attribute information to obtain quantized data conforming to a preset format, and preferably, may perform boolean normalization.
S102, preprocessing the multi-dimensional attribute information and the historical behavior data.
In a specific implementation, the device may perform preprocessing on the multidimensional attribute information and the historical behavior data, specifically including one or more of screening, cleaning, missing value processing, and singular value processing.
For example, the null data may be filled and interpolated to smooth the data to keep the data consistent. The singular value data processing mode is as follows: if the data is an abnormal high point or an abnormal low point, the data can be rejected.
And S103, inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge.
It can be understood that the device needs to construct the reinforcement learning model network first, and the specific construction process is as follows:
firstly, the device can define a state module, an action module and a reward module in the reinforcement learning model, then carry out algorithm optimization design on a strategy function, a strategy gradient and a value function module in the reinforcement learning model, and then construct a reinforcement learning model network according to the designed algorithm.
Furthermore, the equipment can input the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training, and finally the recommendation and financing system acquires the recommendation knowledge.
And S104, recommending financial products to the target user according to the recommendation knowledge.
It will be appreciated that the new user sample incoming system will automatically give the financial products that the user is most appropriate to click on and purchase, and recommend to the target customer, the target user to whom the new user sample corresponds.
Optionally, the device may recommend a financial product to the target user by means of a short message and/or a telephone, and obtain feedback information of the user.
In the embodiment of the invention, the behavior sequence information of the user is considered, and a reinforcement learning model is adopted, so that the recommendation system digs the relationship between the historical browsing information of the user and the information of the financial product, the accurate personalized recommendation is realized, the accuracy and the conversion rate of recommending the financial product are improved, and the recommendation system can capture and track the dynamic changes of the interest and the behavior of the modeling user, thereby improving the dynamics of the recommendation and obtaining the long-term benefit.
In a specific implementation manner of the embodiment of the present invention, a process of constructing a reinforcement learning model network by a device may be as shown in fig. 2, and includes the following steps:
s201, defining a state module in the reinforcement learning model.
In specific implementation, the device can extract state features based on historical behavior data, take multi-dimensional attribute information of the financial product corresponding to the historical behavior data in a preset time period as the state of the current model, and construct and define a state module in the reinforcement learning model based on the state features and the state.
In the embodiment of the application, the user is regarded as the environment responding to the action of the recommendation system, and the recommendation system needs to sense the state of the environment to make a decision. Based on the assumption that the user tends to click on products in the financial product sequence which are interesting to the user and less clicks on products which are not interesting to the user, the historical click behavior of the user is taken as a data source for extracting the state features. Before each recommendation, the financial product characteristics (including interest rate, conversion rate, sales volume and the like) clicked by the user in the last period of time are taken as the state of the current recommendation system, in addition, in order to distinguish users of different groups, the long-term characteristics of the user are added into the state, and the final state s is defined as:
s=(rate1,cvr1,sale1,…,raten,cvrn,salen,power,item)
wherein n represents the number of historical clicked financial products and is a variable parameter, ratei,cvri,saleiPower, item respectively represent interest rate, conversion rate and sales of financial product i and purchasing power of userA label for the preferred product. In specific implementation, due to different dimensions of state features, the feature values of all dimensions are normalized to [0,1 ]]The interval is then processed.
S202, defining action modules in the reinforcement learning model.
Specifically, the device may construct a ranking vector to define an action module in the reinforcement learning model. For example, the rank vector μ ═ μ (μ)12,…,μm) The rank order is determined by the inner product of its feature scores and the rank weight vector μ.
And S203, defining a reward module in the reinforcement learning model.
Specifically, the device can sort financial products by combining multidimensional attribute information and a system sorting strategy, introduce prior knowledge into a reward function in the reinforcement learning model, and define a reward module in the reinforcement learning model based on the reward function into which the prior knowledge is introduced.
In the embodiment of the application, according to the sorting result of the financial products given by the recommending system, the actions of clicking, purchasing and the like of the user can be regarded as direct feedback of the sorting strategy of the recommending system. The reward rules are defined as follows:
(1) the reward value is the number of users clicking on a product if only click action of the product occurs in the recommendation sequence.
(2) The reward value is the amount of money the product is purchased for if a purchase of a financial product occurs in the recommendation sequence.
(3) In other cases, the reward value is 0.
In order to improve the discrimination of different sorting strategies on the feedback signals, some a priori knowledge can be introduced into the original reward function to accelerate the convergence of the reinforcement learning model, and the reward value of "selecting action a on state s and transferring to state s'" is defined as:
R(s,a,s')=R0(s,a,s')+Φ(s)
wherein R is0(s, a, s') is an originally defined reward function, phi(s) is a function containing a priori knowledge, and the recommended financing corresponding to each stateThe product list information is incorporated into the definition of rewards, which is defined as:
wherein K is the number of products in the recommended financing product list corresponding to the state s, i represents the ith product, and muθ(s) is the action that the recommendation system performs in state s, ML (i | μθ(s)) represents a sorting strategy muθ(s) is a maximum likelihood estimation of clicking or bargaining on the financial product in time, and the feature vector (namely, the features of interest rate, sales volume, popularity score, real-time grade and the like) of the financial product i isThenAnd (4) sorting scores of the financial products i in the state s. Let yiE {0,1} is a label of the financing product i which is actually clicked or traded, and the actual click-to-trade probability p of the financing product i is assumediAnd its ranking scoreSatisfy the requirement of
The likelihood probability of the financial product i is:
taking logarithm of the product, and integrating the log likelihood probabilities of all financial products:
the click and deal effects are taken into consideration, and for the financial product recommendation list with only click, the following are correspondingly carried out:
wherein the content of the first and second substances,is a label of whether the financial product i is clicked or not. For the sample with the deal occurrence, adding the commodity price factor into the sample to obtain
Wherein the content of the first and second substances,and Pr iceiRespectively, a label of whether the financial product i is purchased or not and its price.
And S204, performing algorithm optimization design on the strategy function, the strategy gradient and the value function module in the reinforcement learning model.
In specific implementation, the device may express the policy by using a parameterized function, and complete the learning of the policy function by optimizing the parameter. Preferably, the device may adopt a strategy approximation method, that is, a parameterized function is used to express the strategy, and the learning of the strategy is completed by optimizing the parameters. And (4) carrying out real-time regulation and optimization of sequencing by using a deterministic strategy gradient algorithm. Taking the state characteristics as input, taking the finally effective sorting weight as output, and outputting the action for any state s
Wherein θ ═ θ12,…,θm) Is a vector of parameters for the motion that,the sorting weight of the ith dimension is divided into
Where φ(s) is the feature vector of state s, CiAnd (5) constant for sorting the weight distribution in the ith dimension.
Further, the device may obtain an objective function over all states based on the determined policy, and update the objective function according to a gradient policy optimization, wherein the objective function is a sum of long-term accumulated reward expectations. It should be noted that the goal of the reinforcement learning model is to maximize the long-term cumulative reward, i.e., in a deterministic strategy μθBy the action of (1), recommending the sum of the long-term accumulated reward expectations that the system can obtain in all states:
by finding the objective function J (mu)θ) With respect to the gradient of the parameter theta such that J (mu)θ) And maximizing the value, and updating the theta in the gradient direction. According to the strategic gradient theorem, the gradient is
Wherein Q isμ(s, a) is the strategy μθThe lower state action is for a corresponding long-term jackpot prize for (s, a). Thus, the parameter θ is updated by the formula
Wherein alpha isθIn order to obtain a learning rate,is a Jacobian matrix, Qμ(s, a) approximate calculation is carried out by using a value function estimation method, and a linear function estimation method is adopted to express a Q function by using a parameter vector w:
Qμ(s,a)≈Qw(s,a)=φ(s,a)Tw
where φ (s, a) is the feature vector of the state-action pair (s, a), the order may be selectedThen can obtain
Therefore, the update formula of the parameter vector of the policy function is:
further, the device may introduce a merit function, based on which a value function in the reinforcement learning model is designed. It will be appreciated that the value function QwThe parameter vector w of (a) also needs to be updated, which can be referred to as the Q-learning algorithm, for the sample(s)t,at,rt,st+1) Comprises the following steps:
wherein s ist,at,rt,st+1Respectively the state perceived by the recommender system at time t, the action taken, the reward feedback derived therefrom and the state perceived at time t +1, δt+1Referred to as differential error, alphawIs the learning rate of w. Introducing an advantage function, expressing the Q function as the sum of a state value function V(s) and an advantage function A (s, a), estimating the value of the state s from a global perspective with V(s), estimating the advantage of the action a in the state s relative to other actions from a local perspective with A (s, a):
wherein w and V are the parameter vectors of A and V, respectively. Finally, the updating mode of all parameters is as follows:
vt+1=vtvδt+1φ(st)
s205, constructing a reinforcement learning model network according to a designed algorithm.
In the embodiment, the accuracy of personalized recommendation is further improved by constructing the reinforcement learning model network.
The following describes in detail a financial recommendation device and a reinforcement learning model network construction device based on behavior data according to an embodiment of the present invention with reference to fig. 3 to 6. It should be noted that, the financial recommendation device shown in fig. 3-6 is used for executing the method of the embodiment shown in fig. 1 and 2 of the present invention, for convenience of description, only the part related to the embodiment of the present invention is shown, and details of the specific technology are not disclosed, please refer to the embodiment shown in fig. 1 and 2 of the present invention.
Referring to fig. 3, a schematic structural diagram of a financial recommendation apparatus based on behavior data is provided in an embodiment of the present invention. As shown in fig. 3, the financial recommendation apparatus 10 according to an embodiment of the present invention may include: a data acquisition unit 101, a data preprocessing unit 102, a model training unit 103, a product recommendation unit 104, and a data normalization unit 105. As shown in fig. 4, the network construction apparatus 20 may include a module definition unit 201, a function design unit 202, and a model construction unit 203. As shown in fig. 5, the module defining unit 201 includes a feature extracting sub-unit 2011, a state determining sub-unit 2012, a state defining sub-unit 2013, an action defining sub-unit 2014, a product sorting sub-unit 2015, a knowledge introducing sub-unit 2016 and a reward defining sub-unit 2017, and as shown in fig. 6, the function designing unit 202 includes a policy function designing sub-unit 2021, a policy gradient designing sub-unit 2022 and a value function designing sub-unit 2023.
The data acquisition unit 101 is configured to acquire multi-dimensional attribute information and historical behavior data, where the multi-dimensional attribute information includes multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information.
Optionally, the data normalizing unit 105 is configured to perform normalization processing on the multi-dimensional attribute information to obtain quantized data conforming to a preset format.
And the data preprocessing unit 102 is used for preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing.
And the model training unit 103 is configured to input the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge.
And the product recommending unit 104 is used for recommending financial products to the target user according to the recommending knowledge.
In another embodiment:
the module definition unit 201 is used for defining a state module, an action module and a reward module in the reinforcement learning model.
In an alternative embodiment, the module definition unit 201 includes:
a feature extraction subunit 2011, configured to extract the status feature based on the historical behavior data.
The state determining subunit 2012 is configured to use the multi-dimensional attribute information of the financial product corresponding to the historical behavior data in the preset time period as the state of the current model.
And the state definition subunit 2013 is used for constructing and defining a state module in the reinforcement learning model based on the state features and the states.
And the action definition subunit 2014 is used for constructing a sequencing vector, so that the sequencing vector defines an action module in the reinforcement learning model.
And the product sorting sub-unit 2015 is used for sorting the financing products by combining the multi-dimensional attribute information and the system sorting strategy.
A knowledge introduction subunit 2016 configured to introduce a priori knowledge for the reward function in the reinforcement learning model.
A reward definition subunit 2017, configured to define a reward module in the reinforcement learning model based on a reward function introducing prior knowledge.
And the function design unit 202 is configured to perform algorithm optimization design on the policy function, the policy gradient and the value function module in the reinforcement learning model.
In an alternative embodiment, the function design unit 202 includes:
the strategy function design subunit 2021 is configured to express a strategy by using a parameterized function, and complete the learning of the strategy function by optimizing the parameter.
A strategy gradient design subunit 2022, configured to obtain an objective function in all states based on the determined strategy, and optimally update the objective function according to the gradient strategy, where the objective function is the sum of long-term accumulated reward expectations.
A value function designing subunit 2023, configured to introduce a merit function, and design a value function in the reinforcement learning model based on the merit function.
And the model construction unit 203 is used for constructing the reinforcement learning model network according to the designed algorithm.
It should be noted that, for the detailed execution process of each unit and sub-unit in this embodiment, reference may be made to the description in the foregoing method embodiment, and details are not described here again.
In the embodiment of the invention, the behavior sequence information of the user is considered, and a reinforcement learning model is adopted, so that the recommendation system digs the relationship between the historical browsing information of the user and the information of the financial product, the accurate personalized recommendation is realized, the accuracy and the conversion rate of recommending the financial product are improved, and the recommendation system can capture and track the dynamic changes of the interest and the behavior of the modeling user, thereby improving the dynamics of the recommendation and obtaining the long-term benefit.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments shown in fig. 1 and fig. 2, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 1 and fig. 2, which are not described herein again.
The embodiment of the application also provides computer equipment. As shown in fig. 7, the computer device 30 may include: the at least one processor 301, e.g., CPU, the at least one network interface 304, the user interface 303, the memory 305, the at least one communication bus 302, and optionally, a display screen 306. Wherein a communication bus 302 is used to enable the connection communication between these components. The user interface 303 may include a touch screen, a keyboard or a mouse, among others. The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and a communication connection may be established with the server via the network interface 304. The memory 305 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 305 includes a flash in the embodiment of the present invention. The memory 305 may alternatively be at least one memory system located remotely from the processor 301. As shown in fig. 7, memory 305, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and program instructions.
It should be noted that the network interface 304 may be connected to a receiver, a transmitter or other communication module, and the other communication module may include, but is not limited to, a WiFi module, a bluetooth module, etc., and it is understood that the computer device in the embodiment of the present invention may also include a receiver, a transmitter, other communication module, etc.
Processor 301 may be configured to call program instructions stored in memory 305 and cause computer device 30 to:
acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;
preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, clearing, missing value processing and singular value processing;
inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;
and recommending financial products to the target user according to the recommendation knowledge.
In some embodiments, the apparatus 30 is further configured to:
defining a state module, an action module and a reward module in the reinforcement learning model;
performing algorithm optimization design on a strategy function, a strategy gradient and a value function module in the reinforcement learning model;
and constructing a reinforcement learning model network according to a designed algorithm.
In some embodiments, the apparatus 30 is further configured to:
and carrying out standardization processing on the multi-dimensional attribute information to obtain quantitative data conforming to a preset format.
In some embodiments, the normalization process is a boolean normalization process.
In some embodiments, the apparatus 30, when defining the state module in the reinforcement learning model, is specifically configured to:
extracting state features based on historical behavior data;
taking multi-dimensional attribute information of the financial product corresponding to historical behavior data in a preset time period as the state of the current model;
and constructing and defining a state module in the reinforcement learning model based on the state characteristics and the state.
In some embodiments, the apparatus 30, when defining the action module in the reinforcement learning model, is specifically configured to:
and constructing a sequencing vector, and defining an action module in the reinforcement learning model by using the sequencing vector.
In some embodiments, the device 30, when defining the reward module in the reinforcement learning model, is specifically configured to:
sorting the financial products by combining the multi-dimensional attribute information and a system sorting strategy;
introducing prior knowledge into a reward function in the reinforcement learning model;
a reward module in the reinforcement learning model is defined based on a reward function that introduces a priori knowledge.
In some embodiments, the apparatus 30 is specifically configured to, when performing algorithm optimization design on the policy function, the policy gradient, and the value function module in the reinforcement learning model:
expressing the strategy by adopting a parameterized function, and finishing the learning of the strategy function by optimizing the parameter;
obtaining an objective function on all states based on the determined strategy, and optimizing and updating the objective function according to a gradient strategy, wherein the objective function is the sum of long-term accumulated reward expectation;
and introducing a merit function, and designing a value function in the reinforcement learning model based on the merit function.
In the embodiment of the invention, the behavior sequence information of the user is considered, and a reinforcement learning model is adopted, so that the recommendation system digs the relationship between the historical browsing information of the user and the information of the financial product, the accurate personalized recommendation is realized, the accuracy and the conversion rate of recommending the financial product are improved, and the recommendation system can capture and track the dynamic changes of the interest and the behavior of the modeling user, thereby improving the dynamics of the recommendation and obtaining the long-term benefit.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (10)

1. A financial recommendation method based on behavior data is characterized by comprising the following steps:
acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;
preprocessing the multi-dimensional attribute information and the historical behavior data, wherein the preprocessing comprises one or more of screening, cleaning, missing value processing and singular value processing;
inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;
and recommending financial products to the target user according to the recommendation knowledge.
2. The method of claim 1, further comprising:
defining a state module, an action module and a reward module in the reinforcement learning model;
performing algorithm optimization design on a strategy function, a strategy gradient and a value function module in the reinforcement learning model;
and constructing a reinforcement learning model network according to a designed algorithm.
3. The method of claim 1, further comprising:
and carrying out standardization processing on the multidimensional attribute information to obtain quantitative data which accords with a preset format.
4. The method of claim 3, wherein the normalization process is a Boolean normalization process.
5. The method of claim 2, wherein defining a state module in the reinforcement learning model comprises:
extracting state features based on the historical behavior data;
taking the multi-dimensional attribute information of the financial product corresponding to the historical behavior data in a preset time period as the state of the current model;
and constructing a state module in the defined reinforcement learning model based on the state characteristics and the state.
6. The method of claim 2, wherein defining an action module in a reinforcement learning model comprises:
and constructing a sequencing vector, and defining an action module in the reinforcement learning model by using the sequencing vector.
7. The method of claim 2, wherein defining a reward module in a reinforcement learning model comprises:
sorting the financial products by combining the multi-dimensional attribute information and a system sorting strategy;
introducing prior knowledge into a reward function in the reinforcement learning model;
a reward module in the reinforcement learning model is defined based on a reward function that introduces a priori knowledge.
8. The method of claim 2, wherein the performing an algorithm optimization design on the strategy function, the strategy gradient and the value function module in the reinforcement learning model comprises:
expressing the strategy by adopting a parameterized function, and finishing the learning of the strategy function by optimizing the parameter;
obtaining an objective function on all states based on the determined strategy, and optimizing and updating the objective function according to a gradient strategy, wherein the objective function is the sum of long-term accumulated reward expectation;
and introducing a merit function, and designing a value function in the reinforcement learning model based on the merit function.
9. A financial recommendation device based on behavioral data, comprising:
the data acquisition unit is used for acquiring multi-dimensional attribute information and historical behavior data, wherein the multi-dimensional attribute information comprises multi-dimensional attribute information of financial products and multi-dimensional attribute information of users corresponding to the multi-dimensional attribute information;
the data preprocessing unit is used for preprocessing the multi-dimensional attribute information and the historical behavior data, and the preprocessing comprises one or more of screening, clearness, missing value processing and singular value processing;
the model training unit is used for inputting the preprocessed multidimensional attribute information into the constructed reinforcement learning model network for training to obtain recommended knowledge;
and the product recommending unit is used for recommending financial products to the target user according to the recommending knowledge.
10. A computer device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method for automatic generation of text excerpts based on deep learning according to any one of claims 1 to 8.
CN201910983508.0A 2019-10-16 2019-10-16 Behavior data based financing recommendation method, device and equipment Pending CN110598120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910983508.0A CN110598120A (en) 2019-10-16 2019-10-16 Behavior data based financing recommendation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910983508.0A CN110598120A (en) 2019-10-16 2019-10-16 Behavior data based financing recommendation method, device and equipment

Publications (1)

Publication Number Publication Date
CN110598120A true CN110598120A (en) 2019-12-20

Family

ID=68867586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910983508.0A Pending CN110598120A (en) 2019-10-16 2019-10-16 Behavior data based financing recommendation method, device and equipment

Country Status (1)

Country Link
CN (1) CN110598120A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737579A (en) * 2020-06-28 2020-10-02 北京达佳互联信息技术有限公司 Object recommendation method and device, electronic equipment and storage medium
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance
CN112837116A (en) * 2021-01-13 2021-05-25 中国农业银行股份有限公司 Product recommendation method and device
CN112948700A (en) * 2021-04-14 2021-06-11 刘蒙 Fund recommendation method
CN113129108A (en) * 2021-04-26 2021-07-16 山东大学 Product recommendation method and device based on Double DQN algorithm
CN114297511A (en) * 2022-01-27 2022-04-08 中国农业银行股份有限公司 Financing recommendation method, device, system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108776922A (en) * 2018-06-04 2018-11-09 北京至信普林科技有限公司 Finance product based on big data recommends method and device
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230058A (en) * 2016-12-09 2018-06-29 阿里巴巴集团控股有限公司 Products Show method and system
CN108776922A (en) * 2018-06-04 2018-11-09 北京至信普林科技有限公司 Finance product based on big data recommends method and device
CN110008409A (en) * 2019-04-12 2019-07-12 苏州市职业大学 Based on the sequence of recommendation method, device and equipment from attention mechanism

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737579A (en) * 2020-06-28 2020-10-02 北京达佳互联信息技术有限公司 Object recommendation method and device, electronic equipment and storage medium
CN112507209A (en) * 2020-11-10 2021-03-16 中国科学院深圳先进技术研究院 Sequence recommendation method for knowledge distillation based on land moving distance
CN112837116A (en) * 2021-01-13 2021-05-25 中国农业银行股份有限公司 Product recommendation method and device
CN112948700A (en) * 2021-04-14 2021-06-11 刘蒙 Fund recommendation method
CN113129108A (en) * 2021-04-26 2021-07-16 山东大学 Product recommendation method and device based on Double DQN algorithm
CN114297511A (en) * 2022-01-27 2022-04-08 中国农业银行股份有限公司 Financing recommendation method, device, system and storage medium

Similar Documents

Publication Publication Date Title
US10958748B2 (en) Resource push method and apparatus
CN110598120A (en) Behavior data based financing recommendation method, device and equipment
CN106651542B (en) Article recommendation method and device
CN108230058B (en) Product recommendation method and system
CN112632385A (en) Course recommendation method and device, computer equipment and medium
CN111784455A (en) Article recommendation method and recommendation equipment
WO2022016522A1 (en) Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium
CN109903103B (en) Method and device for recommending articles
CN107423308B (en) Theme recommendation method and device
CN110008397B (en) Recommendation model training method and device
CN110135951B (en) Game commodity recommendation method and device and readable storage medium
US20230162005A1 (en) Neural network distillation method and apparatus
CN114117216A (en) Recommendation probability prediction method and device, computer storage medium and electronic equipment
CN112380449B (en) Information recommendation method, model training method and related device
CN111754278A (en) Article recommendation method and device, computer storage medium and electronic equipment
CN110688565A (en) Next item recommendation method based on multidimensional Hox process and attention mechanism
CN111582932A (en) Inter-scene information pushing method and device, computer equipment and storage medium
CN112036954A (en) Item recommendation method and device, computer-readable storage medium and electronic device
CN112528164A (en) User collaborative filtering recall method and device
CN115631012A (en) Target recommendation method and device
CN113592593A (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN116823410A (en) Data processing method, object processing method, recommending method and computing device
CN113553501A (en) Method and device for user portrait prediction based on artificial intelligence
CN115700550A (en) Label classification model training and object screening method, device and storage medium
CN118043802A (en) Recommendation model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wei Shuang

Inventor after: Lin Lu

Inventor after: Lin Xiaozhong

Inventor after: Jia Weiqiang

Inventor before: Wei Shuang

Inventor before: Lin Lu

Inventor before: Jia Weiqiang

CB03 Change of inventor or designer information
CB02 Change of applicant information

Address after: Xinyada technology building, 3888 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province 310000

Applicant after: Sinyada Technology Co.,Ltd.

Address before: Xinyada technology building, 3888 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province 310000

Applicant before: SUNYARD SYSTEM ENGINEERING Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication