A kind of investment product combined recommendation method and system
Technical field
The present invention relates to field of computer technology, more particularly to a kind of investment product combined recommendation method and system.
Background technology
With the progress and expanding economy of society, Investment & Financing has been increasingly becoming each Man's Demands in society,
And Investment & Financing means at present on the market are various, investment product is multifarious, how to select rational finance product to become
The puzzlement of people, goes back the perfect finance product commending system of neither one on the market at present, and finance product assembled scheme is recommended more
Do not know where to begin.
Imperfect finance product assembled scheme commending system can cause choosing of the vast ordinary consumer in investment product
Select and be absorbed in blindly, it is usually excessively radical or overly conservative, income and risk can not be weighed, cause efficiency of investment low.
The content of the invention
The present invention is directed to problems of the prior art, there is provided a kind of investment product combined recommendation method and system,
The prime investment product mix for being adapted to user can be recommended to user.
The technical scheme that the present invention proposes with regard to above-mentioned technical problem is as follows:
On the one hand, the present invention provides a kind of investment product combined recommendation method, including:
Gather user profile and investment product information;
Establish and train intensified learning network model;
According to the user profile and based on the intensified learning network model after training, consumer's risk preference is obtained;
According to the consumer's risk preference and the investment product information, obtain the investment product recommended to user and combine;
Record user adopts actual gain and risk information after the investment product combination, and according to the actual gain
Optimize the intensified learning network model with risk information.
Further, it is described to establish and train intensified learning network model, specifically include:
Obtain the historical yield and risk information of historical user information, history investment product information and investment product;
The intensified learning network model is established, and the historical user information is inputted to the intensified learning network mould
Type, export an initial risks preference;
According to the initial risks preference and the history investment product information, pre- recommendation investment product combination is obtained;
The pre- historical yield for recommending investment product combination is back to the intensified learning network mould with risk information
Type, to adjust the parameter of the intensified learning network model, until the state of the intensified learning network model reaches optimal.
Further, it is described according to the consumer's risk preference and the investment product information, obtain what is recommended to user
Investment product combines, and specifically includes:
Different investment products are arranged in pairs or groups according to the investment product information, investment of the generation with different risk factors
Product mix list;
The consumer's risk preference combines with the investment product in the investment product Assembly Listing to carry out cosine similar
Degree matching, the combination of similarity highest multiple investment products is obtained, and Income Maximum during the multiple investment product is combined
Investment product combination is as the investment product combination recommended to user.
Further, the intensified learning network model includes executor's Actor networks;
It is described to obtain consumer's risk preference according to the user profile and based on the intensified learning network model after training,
Specifically include:
The user profile is inputted to the intensified learning network model after the training, exported by the Actor networks
The consumer's risk preference.
Further, the intensified learning network model also includes estimator's Critic networks;
It is described that the intensified learning network model is optimized according to the actual gain and risk information, specifically include:
The actual gain and risk information are inputted to the intensified learning network model, by the Critic networks meter
Calculate award value or punishment value that the investment product that output is recommended to user combines;
The award value or punishment value are inputted to the parameter to the Actor networks, updated in the Actor networks, with
Optimize the intensified learning network model.
Further, the award value that the investment product combination recommended to user is exported from the Critic network calculations
Or punishment value, specifically include:
Detect whether the actual gain and risk information match with the satisfaction of user by the Critic networks;
If matching, the award value for the investment product combination that output is recommended is calculated;
If mismatching, the punishment value for the investment product combination that output is recommended is calculated.
Further, before the foundation and training intensified learning network model, in addition to:
The data gathered are normalized, the data gathered are converted into structural data deposit data
In storehouse.
On the other hand, the present invention provides a kind of investment product combined recommendation system, including:
Information acquisition module, for gathering user profile and investment product information;
Model training module, for establishing and training intensified learning network model;
Risk partiality acquisition module, for according to the user profile and based on the intensified learning network model after training,
Obtain consumer's risk preference;
Recommending module, for according to the consumer's risk preference and the investment product information, obtaining what is recommended to user
Investment product combines;And
Model optimization module, adopt actual gain and risk information after the investment product combination for recording user,
And the intensified learning network model is optimized according to the actual gain and risk information.
Further, the recommending module specifically includes:
Investment product collocation unit, for being arranged in pairs or groups according to the investment product information to different investment products, generation
Investment product Assembly Listing with different risk factors;And
Investment product combined recommendation unit, for by the consumer's risk preference and the investment product Assembly Listing
Investment product combination carries out cosine similarity matching, obtains the multiple investment product combinations of similarity highest, and will be the multiple
The investment product combination of Income Maximum is as the investment product combination recommended to user in investment product combination.
Further, the intensified learning network model includes executor Actor networks and estimator's Critic networks;
The risk partiality acquisition module is specifically used for:
The user profile is inputted to the intensified learning network model after the training, exported by the Actor networks
The consumer's risk preference;
The model optimization module specifically includes:
Output unit is calculated, for the actual gain and risk information to be inputted to the intensified learning network model,
The award value or punishment value combined by the investment product of the Critic network calculations output recommendation;And
Parameter updating block, for the award value or punishment value to be inputted to the Actor networks, described in renewal
Parameter in Actor networks, to optimize the intensified learning network model.
The beneficial effect that technical scheme provided in an embodiment of the present invention is brought is:
Intensified learning network model is established, consumer's risk preference is obtained by the user profile of collection, and to consumer's risk
Preference carries out demand matching, recommends user to match the prime investment product mix of suitable user, adopts the throwing in user
After providing product mix, actual gain and risk information that the investment product is combined feed back to intensified learning network model, constantly
Optimize intensified learning network model, improve the matching precision of intensified learning network model, and can with flexible adaptation environment, for
For investor, dynamic risk can be effectively held at any time in the market, obtain maximum revenue.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings
Accompanying drawing.
Fig. 1 is the schematic flow sheet for the investment product combined recommendation method that the embodiment of the present invention one provides;
Fig. 2 is investment product combined recommendation principle in the investment product combined recommendation method that the embodiment of the present invention one provides
Figure;
Fig. 3 is the structural representation for the investment product combined recommendation system that the embodiment of the present invention two provides.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to embodiment party of the present invention
Formula is described in further detail.
Embodiment one
The embodiments of the invention provide a kind of investment product combined recommendation method, referring to Fig. 1, this method includes:
S1, collection user profile and investment product information;
S2, foundation simultaneously train intensified learning network model;
S3, according to the user profile and based on the intensified learning network model after training, obtain consumer's risk preference;
S4, according to the consumer's risk preference and the investment product information, obtain the investment product group recommended to user
Close;
S5, record user adopt actual gain and risk information after the investment product combination, and according to the reality
Income optimizes the intensified learning network model with risk information.
It should be noted that in step sl, the user profile gathered includes userspersonal information and personal preference is believed
Breath, the i.e. occupation including investor, income, hobby, deposit, location, social circle, whether there is car, whether have loan, whether have
Medical security, whether there are the information such as insurance, social security information, personal reference.Investment product information includes investment product attribute information
With economic environment information, wherein, the cycle of investment product attribute information including investment product, greateset risk, prospective earnings, realization
Speed, stability etc., economic environment information include share price points, monetary exchange rate, crude oil price etc..
Further, before the foundation and training intensified learning network model, in addition to:
The data gathered are normalized, the data gathered are converted into structural data deposit data
In storehouse.
It should be noted that, it is necessary to which data are carried out with certain pretreatment, i.e. normalizing after enough data are collected
Change is handled, including:Data are uniformly arrived into identical dimensional, such as occupation can be expressed as by word2vec models for lawyer (0,0.6,
0.5), student is represented by (0.7,0.1,0.3) etc.;By all information unifications to same period, such as income is collectively expressed as
In units of year and RMB;Data structured, such as userspersonal information are expressed as { sex, occupation, income etc. }, investment production
Whether at any time product information be expressed as { average annual earnings, maximum annual earnings, most big year loss scope, realization, evaluation risk factor
Deng.
Further, in step s 2, it is described to establish and train intensified learning network model, specifically include:
Obtain the historical yield and risk information of historical user information, history investment product information and investment product;
The intensified learning network model is established, and the historical user information is inputted to the intensified learning network mould
Type, export an initial risks preference;
According to the initial risks preference and the history investment product information, pre- recommendation investment product combination is obtained;
The pre- historical yield for recommending investment product combination is back to the intensified learning network mould with risk information
Type, to adjust the parameter of the intensified learning network model, until the state of the intensified learning network model reaches optimal.
It should be noted that the pre- historical yield for recommending investment product combination is being back to intensified learning with risk information
After network model, intensified learning network model calculates one award value of output or penalty value, then the award value or penalty value are returned
Intensified learning network model is back to, to be adjusted to the parameter in intensified learning network model, and then user profile is inputted
In intensified learning network model after to adjustment, continue to be trained intensified learning network model, until intensified learning network
The state of model reaches optimal.
Intensified learning network model includes executor Actor networks and estimator's Critic networks.Actor networks are one
Full Connection Neural Network model, the structure of its input is identical with the structure of structuring user profile, to input user profile,
Its output end is the different some classification output items of risk factor, to export corresponding consumer's risk according to user profile analysis
Preference.
After intensified learning network model is established, Actor network parameters f is initializedθπWith Critic network parameters Qθπ, just
Beginningization Actor objective network parameter θsπ’←θπ, with Critic objective network parameter θsQ’←θQ, use Actor netinit plans
Slightly model g, initialization historical record storage container B.
And then performed in the iteration that number is M:
Initialize all optional motion spaces (obtaining the consumer's risk preference options that can be selected);
Receive the status information (obtaining user profile) from environment.
Performed in the iteration that number is T:
One action is obtained according to acquisition information and Actor networks;
Among the corresponding information deposit storage container B of made next action and acquisition action at present;
A part of sample is sampled among storage container B;
By minimizing loss function L (θQ) renewal Critic network parameters, more new formula is as follows for it:
Wherein, yiRepresent target output, riAward value is represented, γ represents incentive discount coefficient,Represent Actor
Network is in state si+1Lower selection execution action ai+1Strategic function,Represent in state si+1In adopt
Take action ai+1The maximum award value that can be obtained, θQ′For Critic objective network parameters, θQFor Critic network parameters.
Actor network parameters are updated using sampled gradients, more new formula is as follows for it:
Wherein, θπActor network parameters are represented,Represent Actor networks reflecting corresponding to motion space under state s
Penetrate function,Represent that formula is to weight θ in bracketπDifferentiate.
In addition, Critic objective networks parameter, the more new formula of Actor objective network parameters are as follows:
θQ′←τθQ+(1-τ)θQ′;
θπ′←τθπ+(1-τ)θπ′;
Wherein, τ represents coefficient correlation, θQ、θπCritic, Actor network parameter, θ are represented respectivelyQ′、θπ′Represent respectively
Critic, Actor objective network parameter.
After intensified learning network architecture parameters are updated, the new state of user is inputted to intensified learning network model, with
Circulation performs above-mentioned steps, continues iteration and intensified learning network architecture parameters are updated, until intensified learning network model
State reach optimal.
Further, in step s3, it is described according to the user profile and based on the intensified learning network mould after training
Type, consumer's risk preference is obtained, is specifically included:
The user profile is inputted to the intensified learning network model after the training, exported by the Actor networks
The consumer's risk preference.
Further, in step s 4, it is described according to the consumer's risk preference and the investment product information, obtain to
The investment product combination that user recommends, is specifically included:
Different investment products are arranged in pairs or groups according to the investment product information, investment of the generation with different risk factors
Product mix list;
The consumer's risk preference combines with the investment product in the investment product Assembly Listing to carry out cosine similar
Degree matching, the combination of similarity highest multiple investment products is obtained, and Income Maximum during the multiple investment product is combined
Investment product combination is as the investment product combination recommended to user.
It should be noted that by the investment product information collected respectively with the earning rate and greateset risk table of same period
Show, form product list, and mix into the combination of some risk factors respectively according to capital management principle, ultimately generate a series of
Investment product Assembly Listing with different risk factors, to be matched with consumer's risk preference.
Consumer's risk preference is provided in the form of risk factor, according to user's request (such as access flexibly, storage the cycle
Deng) by consumer's risk Preference Conversion it is a vector representation form, to be combined with the investment product in investment product Assembly Listing
Carry out cosine similarity matching.
Cosine similarity matching process is as follows:
Wherein, cos θ are cosine similarity, and a is the vector of consumer's risk preference, and b is the vector of investment product combination.
After matching, the individual investment product combinations of cosine similarity highest k (k >=1) are obtained, and combine from k investment product
It is middle to obtain the combination of income highest investment product as the investment product combination recommended to user.
Further, in step s 5, it is described that the intensified learning net is optimized according to the actual gain and risk information
Network model, is specifically included:
The actual gain and risk information are inputted to the intensified learning network model, by the Critic networks meter
Calculate award value or punishment value that the investment product that output is recommended to user combines;
The award value or punishment value are inputted to the parameter to the Actor networks, updated in the Actor networks, with
Optimize the intensified learning network model.
Further, the award value that the investment product combination recommended to user is exported from the Critic network calculations
Or punishment value, specifically include:
Detect whether the actual gain and risk information match with the satisfaction of user by the Critic networks;
If matching, the award value for the investment product combination that output is recommended is calculated;
If mismatching, the punishment value for the investment product combination that output is recommended is calculated.
It should be noted that after by investment product combined recommendation to user, investment product combination is added to history
Among behavior record, and the actual profit and risk status of Follow-up observation investment product combination.Periodically investment product is combined
Actual profit input to intensified learning network model and calculated with risk information, if the reality for the investment product combination recommended
Profit matches with risk and user's ability to cope with the exigency, i.e., the satisfaction with user matches, and exports an award value;If push away
The actual profit and risk and user's ability to cope with the exigency for the investment product combination recommended have certain deviation, the i.e. satisfaction with user
Degree mismatches, and exports a penalty value.By award value or the punisher ginseng for feeding back to Actor networks, updating in Actor networks
Number.All fed back per suboptimization by Bellman equation (Bellman Equation) form with recursive form, constantly update network,
Until every time the investment product combination of recommendation reaches peak efficiency.
It is the schematic diagram for the investment product combined recommendation method that the embodiment of the present invention is provided referring to Fig. 2.Advanced row data
Collection and pretreatment, obtain user profile, and user profile is inputted to Actor networks, exports consumer's risk preference.And then root
The data cosine similarity that is combined with investment product of consumer's risk preference is calculated according to user profile, by similarity highest and income most
Big investment product combination is as the investment product combination recommended to user.By the actual gain that the investment product of recommendation combines with
Risk information is inputted to Critic networks, goes out an award value by Critic network calculations or punishment value feeds back to Actor networks,
To update Actor network parameters, reach the purpose for continuing to optimize intensified learning network model.
The embodiment of the present invention can establish intensified learning network model, and it is inclined to obtain consumer's risk by the user profile of collection
It is good, and demand matching is carried out to consumer's risk preference, user is recommended to match the prime investment product mix of suitable user,
After user adopts investment product combination, actual gain and risk information that the investment product is combined feed back to intensified learning
Network model, intensified learning network model is continued to optimize, improve the matching precision of intensified learning network model, and can be flexible
Environment is adapted to, for investor, dynamic risk can be effectively held at any time in the market, obtain maximum revenue.
Embodiment two
The embodiments of the invention provide a kind of investment product combined recommendation system, can realize that above-mentioned investment product combination pushes away
All flows of method are recommended, referring to Fig. 3, the investment product combined recommendation system includes:
Information acquisition module 1, for gathering user profile and investment product information;
Model training module 2, for establishing and training intensified learning network model;
Risk partiality acquisition module 3, for according to the user profile and based on the intensified learning network model after training,
Obtain consumer's risk preference;
Recommending module 4, for according to the consumer's risk preference and the investment product information, obtaining what is recommended to user
Investment product combines;And
Model optimization module 5, adopt actual gain and risk information after the investment product combination for recording user,
And the intensified learning network model is optimized according to the actual gain and risk information.
Further, the recommending module specifically includes:
Investment product collocation unit, for being arranged in pairs or groups according to the investment product information to different investment products, generation
Investment product Assembly Listing with different risk factors;And
Investment product combined recommendation unit, for by the consumer's risk preference and the investment product Assembly Listing
Investment product combination carries out cosine similarity matching, obtains the multiple investment product combinations of similarity highest, and will be the multiple
The investment product combination of Income Maximum is as the investment product combination recommended to user in investment product combination.
Further, the intensified learning network model includes executor Actor networks and estimator's Critic networks;
The risk partiality acquisition module is specifically used for:
The user profile is inputted to the intensified learning network model after the training, exported by the Actor networks
The consumer's risk preference;
The model optimization module specifically includes:
Output unit is calculated, for the actual gain and risk information to be inputted to the intensified learning network model,
The award value or punishment value combined by the investment product of the Critic network calculations output recommendation;And
Parameter updating block, for the award value or punishment value to be inputted to the Actor networks, described in renewal
Parameter in Actor networks, to optimize the intensified learning network model.
The embodiment of the present invention can establish intensified learning network model, and it is inclined to obtain consumer's risk by the user profile of collection
It is good, and demand matching is carried out to consumer's risk preference, user is recommended to match the prime investment product mix of suitable user,
After user adopts investment product combination, actual gain and risk information that the investment product is combined feed back to intensified learning
Network model, intensified learning network model is continued to optimize, improve the matching precision of intensified learning network model, and can be flexible
Environment is adapted to, for investor, dynamic risk can be effectively held at any time in the market, obtain maximum revenue.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.