CN110111152A

CN110111152A - A kind of content recommendation method, device and server

Info

Publication number: CN110111152A
Application number: CN201910390223.6A
Authority: CN
Inventors: 孔蓓蓓
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2019-08-09

Abstract

The embodiment of the invention provides a kind of content recommendation method, device and servers, wherein method includes: to show the first recommendation to target user by the browser interface of user terminal, the feedback information that target user is directed to the first recommendation in navigation process is obtained, and the quantity for the recommendation message that the first recommendation includes can be adjusted according to the characteristic information of feedback information, the characteristic information of target user and the first recommendation.Through the embodiment of the present invention can be according to factors such as user characteristics, recommendation feature, Real-time Feedbacks, dynamic and the quantity for accurately adjusting the recommendation message that recommendation includes can maximize the income for promoting platform in the case where not influencing user experience.

Description

A kind of content recommendation method, device and server

Technical field

The present invention relates to Internet technical field more particularly to a kind of content recommendation methods, device and server.

Background technique

When carrying out content (such as advertisement etc.) recommendation to user, the tolerance value of each user is different.It is wide to recommend For announcement, clutter (i.e. how many information flows insert an advertisement to user) how is accurately determined, both can guarantee the library of advertisement It deposits, while can guarantee that the not big decaying of the activity of the user is a problem in the urgent need to address again.

Current solution is specifically included that first is that can be made using artificial experience value by policymaker is rule of thumb subjective The value of a fixed clutter divides crowd, formulates several crowd's correlations and be oriented dispensing, so after having user's portrait The value is adjusted by AB Test experimental verification afterwards, this mode expends a large amount of manpowers and time, depend on artificial experience unduly, and And it is unable to accurate balancing user experience and platform income；Second is that the method based on regression analysis, using user characteristics to clutter Regression analysis is carried out, obtains the value of clutter, however the method based on regression analysis, mainly user's row from the past period Learnt to obtain prediction model in, and applied in the advertisement serving policy in future, real-time is poor, and can not optimize length Phase accumulated earnings.

Summary of the invention

The embodiment of the present invention provides a kind of content recommendation method, device and server, dynamic and can accurately adjust The quantity for the recommendation message that recommendation includes can maximize the receipts for promoting platform in the case where not influencing user experience Benefit.

On the one hand, a kind of content recommendation method, comprising:

Show that the first recommendation, first recommendation include to target user by the browser interface of user terminal At least one recommendation message；

Obtain the feedback information that the target user is directed to first recommendation in navigation process；

According to the feedback information, the characteristic information pair of the characteristic information of the target user and first recommendation The quantity for the recommendation message that first recommendation includes is adjusted.

On the other hand, the embodiment of the invention also provides a kind of content recommendation devices, comprising:

Display module, for showing the first recommendation to target user by the browser interface of user terminal, described the One recommendation includes at least one recommendation message；

Module is obtained, for obtaining the target user in navigation process for the feedback letter of first recommendation Breath；

Processing module, in being recommended according to the characteristic information of the feedback information, the target user and described first The characteristic information of appearance is adjusted the quantity for the recommendation message that first recommendation includes.

Another aspect, the embodiment of the invention also provides a kind of servers, including processor, network interface and storage dress It sets, the processor, the network interface and the storage device are connected with each other, wherein the network interface is by the processing The control of device is used for sending and receiving data, and for the storage device for storing computer program, the computer program includes that program refers to It enables, the processor is configured for calling described program instruction, for executing above-mentioned content recommendation method.

Another aspect is deposited in the computer storage medium the embodiment of the invention also provides a kind of computer storage medium Program instruction is contained, which is performed, for realizing above-mentioned content recommendation method.

The embodiment of the present invention shows the first recommendation to target user by the browser interface of user terminal, obtains target User is directed to the feedback information of the first recommendation in navigation process, and can be according to feedback information, the feature of target user The characteristic information of information and the first recommendation is adjusted the quantity for the recommendation message that the first recommendation includes, so as to With according to the factors such as user characteristics, recommendation feature, Real-time Feedback, what dynamic and accurately adjustment recommendation included is pushed away The quantity of message is recommended, thus make quantity of the different user with different recommendation messages, it can be in the feelings for not influencing user experience The income for promoting platform is maximized under condition.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of content recommendation method provided in an embodiment of the present invention；

Fig. 2 a is a kind of structural schematic diagram of DQN network provided in an embodiment of the present invention；

Fig. 2 b is the structural schematic diagram of another kind DQN network provided in an embodiment of the present invention；

Fig. 2 c is a kind of schematic diagram for optimizing DQN network provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of content recommendation device provided in an embodiment of the present invention；

Fig. 4 is a kind of structural schematic diagram of server provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Wherein, intensified learning refers to that intelligent body (Agent) is learnt in a manner of trial and error, by interacting with environment Subsequent behavior is instructed in the award of acquisition, and target is so that intelligent body is obtained maximum award, and dynamically adjust parameter, To reach enhanced signal maximum.

Wherein, server described in the embodiment of the present invention can refer to social interaction server device, Domestic News server etc. The server of information browse is provided, may also mean that special advertisement releasing server etc..

Referring to Fig. 1, being a kind of flow diagram of content recommendation method provided in an embodiment of the present invention.In the present embodiment Described content recommendation method, comprising:

101, server shows the first recommendation to target user by the browser interface of user terminal, and described first pushes away Recommending content includes at least one recommendation message.

Specifically, the content (being denoted as the first recommendation) of recommendation can be sent to user terminal by server, user is whole End in the navigation process of corresponding user (being denoted as target user) by browser interface by the first recommendation include it is a plurality of Recommendation message shows the target user in an orderly manner.

By taking advertisement is recommended as an example, which includes a plurality of advertisement, it is assumed that target user is existed by user terminal Domestic News are browsed, then user terminal shows an advertisement after the certain Domestic News of the every browsing of target user.Wherein, The advertisement bar number that one recommendation includes is more, then the frequency for occurring advertisement in the navigation process of target user is higher.

102, the server obtains the feedback that the target user is directed to first recommendation in navigation process Information.

Wherein, target user can take certain measure after seeing the first recommendation of displaying, and target user adopts The measure taken can be considered that target user checks the feedback information of the first recommendation, such as click, or close recommendation Content, or selection reduce the frequency of commending contents, if target user selects to click the content for checking recommendation, feedback information It can further include access depth, access duration etc., in addition, if the frequency that current recommendation occurs in target user More satisfied or without the resentment such as generating dislike, detesting, then target user, which is likely to continue to refresh browser interface, adds New content is carried, i.e. refreshing frequency can also be considered as target user to the feedback information of the first recommendation.

It should be noted that feedback information here can specifically include target user for including in the first recommendation Each recommendation message feedback information.

103, the server according to the feedback information, the target user characteristic information and it is described first recommend in The characteristic information of appearance is adjusted the quantity for the recommendation message that first recommendation includes.

Wherein, the characteristic information of target user may include age, gender, region, income, hobby etc., and first The characteristic information of recommendation may include content type, text size etc..

Specifically, server can be recommended according to the feedback information of target user, the characteristic information of target user and first The characteristic information of content is adjusted the quantity for the recommendation message that the first recommendation includes, since feedback information reflects mesh User is marked to the real-time attitude of recommendation, including is willing to accept these recommendations or more dislikes these recommendations, To utilize the newest feedback information of target user, the feature of characteristic information and the first recommendation in conjunction with target user is believed Breath can in time, the quantity of the recommendation message that dynamically includes to the first recommendation accurately adjusted.

In the embodiment of the present invention, server is shown in the first recommendation by the browser interface of user terminal to target user Hold, obtains the feedback information that target user is directed to the first recommendation in navigation process, and can be according to feedback information, target The quantity for the recommendation message that the characteristic information of the characteristic information of user and the first recommendation includes to the first recommendation carries out Adjustment, so as to quickly and accurately adjust recommendation according to the factors such as user characteristics, recommendation feature, Real-time Feedback The quantity for the recommendation message for including can not influence user so that different user be made to have the quantity of different recommendation messages The income for promoting platform is maximized in the case where experience.

It in some possible embodiments, as shown in Figure 2 a, is a kind of deeply study (Deep provided by the invention Q-Learning, DQN) network structural schematic diagram.Intelligent body (Agent) will act (Action) and be applied to environment (Environment), environment gives intelligent body and rewards (Reward) accordingly, and reward includes user to advertisement (i.e. recommendation) Click (AD click) and the activity of the user (User active), the state (State) of environment changes therewith, Intelligent body by experience (Exploitation) and explore (Exploration) in the way of according to the variation of ambient condition and receive Reward generate and new movement and be applied in environment.

In some possible embodiments, server shows first to target user by the browser interface of user terminal Before recommendation, the characteristic information of available target user determines that second pushes away using the first DQN network and characteristic information Content is recommended, third recommendation is determined using the 2nd DQN network and characteristic information, is pushed away further according to the second recommendation and third Recommend the first recommendation that content determines target user.

In some possible embodiments, the refresh requests that browser interface is submitted can be directed to detecting target user When, server just obtains the characteristic information of target user, and determines the first recommendation, that is to say, that when target user brushes When new browser interface, server just pushes newest recommendation to the target user.

Wherein, framework of the present invention using double DQN (Double-DQN) networks, the first DQN network empirically network, i.e., Experience (Exploitation) based on user's history evaluates user, the 2nd DQN network is explored as prediction network (Exploration) to predict the possible variation tendency of user, so that the first recommendation for issuing target user both includes base The recommendation determined is showed in user's history, also comprising the recommendation predicted user, to not only may be used To exercise supervision study based on historical data, the dynamic change of user can also be handled.

In some possible embodiments, server is pushed away according to feedback information, the characteristic information of target user and first Recommending the concrete mode that the characteristic information of content is adjusted the quantity for the recommendation message that the first recommendation includes may is that clothes Business device obtains the corresponding contextual information of feedback information, and contextual information for example may include that target user clicks recommendation Time, place etc., server by feedback information, the characteristic information of target user, the first recommendation characteristic information and up and down Literary information inputs the first DQN network and the 2nd DQN network respectively, to obtain the prediction reward value of the first recommendation, server The quantity for the recommendation message that can include to the first recommendation according to prediction reward value is adjusted.

Specifically, as shown in Figure 2 b, being the structural schematic diagram of another kind DQN network provided by the invention, feature being set Part is counted, each input of DQN network has following four Partial Feature: the feature AD feature of advertisement (corresponds to above-mentioned first The characteristic information of recommendation), the feature User of user (characteristic information of corresponding above-mentioned target user), advertisement and user Interaction feature User*AD (corresponding above-mentioned feedback information), contextual feature Context (corresponding above-mentioned contextual information). From the point of view of Fig. 2 a, in this four groups of features, the feature and contextual feature of user is used to indicate current State, advertisement Feature, advertisement and the interaction feature of user are used to indicate an Action of current State, by the model of double DQN networks (current state State can be exported after V (s) and A (s, a)) processing to take the prediction Q value Q of this Action (s a) (is predicted Reward value).The real value of Q includes two parts: the reward and future obtained immediately obtains discounting for reward:

y_s,a=Q (s, a)=r_immediate+γr_future

Wherein, reward immediately may include two parts, i.e. the click reward of user and user activity reward.Due to taking The structure of Double-DQN, the calculating of Q reality value become:

Wherein, the present invention feeds back user activity and user as a kind of new feedback information to the click of advertisement, uses Family liveness can be understood as refreshing frequency, number of clicks, and good result can increase user's frequency of use, therefore can be used as One feedback index.If user is within a certain period of time without click behavior, liveness can decline, but once have click to go For before liveness can rise, and after considering click and liveness, the reward immediately mentioned becomes: (click reward and Liveness reward)

r_total=r_click+βr_active

In some possible embodiments, server obtains target user and is directed to the first recommendation in navigation process The concrete mode of feedback information may is that server obtains target user in navigation process to coming from the first recommendation First feedback information of the recommendation message of the second recommendation, and pushed away in the first recommendation from third recommendation Recommend the second feedback information of message, and using the first feedback information and the second feedback information as target user the needle in navigation process To the feedback information of the first recommendation, pushed away so as to obtain target user for what two DQN network models provided respectively Recommend the degree of recognition of content.

In some possible embodiments, server determines first according to the first feedback information and the second feedback information The recommendation effect of the recommendation effect of DQN network and the 2nd DQN network, recommendation effect can pass through number of clicks, access depth, visit It asks that the dimensions such as duration are evaluated, is better than first using historical experience in the recommendation effect of the 2nd DQN network for exploring prediction In the case where the recommendation effect of DQN network, server carries out the parameter of the first DQN network according to the parameter of the 2nd DQN network The optimization to the first DQN network is realized in adjustment, and the recommendation that the first DQN network subsequent is provided is more accurate, easily Accuracy in being easily accepted by a user, when guaranteeing that the quantity for the recommendation message for including to recommendation is adjusted.

In some possible embodiments, server determines that target is used according to the second recommendation and third recommendation The concrete mode of first recommendation at family may is that server selects first from the recommendation message that the second recommendation includes Recommendation message selects the second recommendation message from the recommendation message that third recommendation includes, by the first recommendation message and second First recommendation of the recommendation message as target user, so that the first recommendation for issuing target user had both included based on use Family history shows the recommendation determined, also comprising the recommendation predicted user, thus not only can be with base It exercises supervision study in historical data, the dynamic change of user can also be handled.

Specifically, being a kind of schematic diagram for optimizing DQN network provided by the invention, by experience DQN network as shown in Figure 2 c MODEL C urrent Network Q (i.e. Exploitation Network Q) and prediction DQN network model Explore The recommendation List that Network Q~(can also be denoted as Exploration Network Q~) the two network models generate L (including recommendation message A, B, C) and List L~(including recommendation message C, D, B) carries out mixing based on probability (Probabilistic Interleave) is pushed away with obtaining final recommendation List L ∧ (comprising recommendation message A, C, D) After giving user (Push to user), the feedback (Collect feedback) of user is recorded, mould is carried out according to user feedback Type optimize (Model choice), in the case that Explore Network Q~recommendation effect more preferably (i.e. user feedback In Feedback user only to recommendation message D carried out click checks operation), then by the parameter of Current Network Q to Explore Network Q~parametric direction be updated (Step towards Q~)；Otherwise, experience Current The parameter of Network Q remains unchanged (Keep Q), and specific calculating is as follows:

Wherein, Explore Network Q~this network parameterIt is to be joined by the network of Current Network Q It is added what certain noise generated on the basis of number w.It specifically calculates as follows:

Δ w=α rand (- 1,1) w

In some possible embodiments, the dynamic that the present invention comes when effective modeling contents are recommended using DQN network becomes Change attribute, DQN will can be returned in short term and return is effectively simulated for a long time.Model is divided into online part and offline part, Several committed steps of online part: by taking advertisement is recommended as an example, at each moment, when user sends interface refresh requests, Agent generates k advertisement according to current State and recommends user, this recommendation results is experience Exploitation and in advance The combination for surveying Exploration, obtains feedback result to the click of recommended advertisements and browsing behavior by user, according to user's The advertisement of information and recommendation and obtained feedback, Agent can assess experience network model Current Network Q and pre- survey grid Network model E xplore Network Q~performance, if the effect of Current Network Q is more preferable, Current Network Q is remained stationary, if Explore Network Q~perform better than, the parameter of Current Network Q It will be to Explore Network Q~variation.In addition, (such as 5 minutes or 10 minutes etc.), Ke Yigen after a period of time According to the historical experience stored in experience pond, Current Network Q model parameter is updated.

In some possible embodiments, disclosed by the invention to be realized using deeply learning method to different user Varying environment dynamic self-adapting recommendation message quantity scheme can be extended to the change of similar other deeplies study The Parametric optimization problem of kind model, such as DN, DDQN, DDQN+U, DDQN+U+EG, the embodiment of the present invention is without limitation.

Referring to Fig. 3, being a kind of structural schematic diagram of content recommendation device provided in an embodiment of the present invention.In the present embodiment Described content recommendation device, comprising:

Display module 301 shows the first recommendation to target user for the browser interface by user terminal, described First recommendation includes at least one recommendation message；

Module 302 is obtained, for obtaining the target user in navigation process for the anti-of first recommendation Feedforward information；

Processing module 303, for being recommended according to the characteristic information of the feedback information, the target user and described first The characteristic information of content is adjusted the quantity for the recommendation message that first recommendation includes.

Optionally, described device further include: determining module 304, in which:

The acquisition module 302, is also used to obtain the characteristic information of target user；

The determining module 304 determines for learning DQN network and the characteristic information using the first deeply Two recommendations；

The determining module 304 is also used to determine using the 2nd DQN network and the characteristic information in third recommendation Hold；

The determining module 304 is also used to according to second recommendation and third recommendation determination The first recommendation of target user.

Optionally, the processing module 303, is specifically used for:

Obtain the corresponding contextual information of the feedback information；

By the feedback information, the characteristic information of the target user, the characteristic information of first recommendation and institute It states contextual information and inputs the first DQN network and the 2nd DQN network respectively, to obtain first recommendation Predict reward value；

The quantity for the recommendation message that first recommendation includes is adjusted according to the prediction reward value.

Optionally, the acquisition module 302, is specifically used for:

Obtain the target user in navigation process in first recommendation come from second recommendation Recommendation message the first feedback information, and disappear to the recommendation from the third recommendation in first recommendation Second feedback information of breath；

It is directed in navigation process using first feedback information and second feedback information as the target user The feedback information of first recommendation.

Optionally, the determining module 304 is also used to true according to first feedback information and second feedback information The recommendation effect of the fixed first DQN network and the recommendation effect of the 2nd DQN network；

The processing module 303 is also used to the recommendation effect in the 2nd DQN network better than the first DQN network Recommendation effect in the case where, the parameter of the first DQN network is adjusted according to the parameter of the 2nd DQN network.

Optionally, the determining module 304, is specifically used for:

The first recommendation message is selected from the recommendation message that second recommendation includes；

The second recommendation message is selected from the recommendation message that the third recommendation includes；

Using first recommendation message and second recommendation message as the first recommendation of the target user.

Optionally, the feedback information include number of clicks, access depth, access one of duration and refreshing frequency or It is a variety of.

It is understood that the function of each functional module of the content recommendation device of the present embodiment can be according to above method reality The method specific implementation in example is applied, specific implementation process is referred to the associated description of above method embodiment, herein no longer It repeats.

In the embodiment of the present invention, display module 301 shows that first pushes away to target user by the browser interface of user terminal Content is recommended, module 302 is obtained and obtains the feedback information that target user is directed to the first recommendation in navigation process, processing module 303 can be according to the characteristic information of feedback information, the characteristic information of target user and the first recommendation to the first recommendation The quantity for the recommendation message for including is adjusted, so as to according to user characteristics, recommendation feature, Real-time Feedback etc. because Element, dynamic and the quantity for accurately adaptively adjusting the recommendation message that recommendation includes, so that different user be made to have not The quantity of same recommendation message can maximize the income for promoting platform in the case where not influencing user experience.

Referring to Fig. 4, being a kind of structural schematic diagram of server provided in an embodiment of the present invention.Described in the present embodiment Server, comprising: processor 401, network interface 402 and memory 403.Wherein, it processor 401, network interface 402 and deposits Reservoir 403 can be connected by bus or other modes, and the embodiment of the present invention by bus for being connected.

Wherein, processor 401 (or central processing unit (Central Processing Unit, CPU)) is server Calculate core and control core.Network interface 402 optionally may include standard wireline interface and wireless interface (such as WI- FI, mobile communication interface etc.), sending and receiving data is used for by the control of processor 401.Memory 403 (Memory) is server Memory device, for storing program and data.It is understood that memory 403 herein can be high speed RAM memory, It is also possible to non-labile memory (non-volatile memory), for example, at least a magnetic disk storage；It is optional to go back It can be at least one storage device for being located remotely from aforementioned processor 401.Memory 403 provides memory space, and the storage is empty Between store the operating system and executable program code of server, it may include but be not limited to: a kind of (operation of Windows system System), Linux (a kind of operating system) system etc., the present invention is to this and is not construed as limiting.

In embodiments of the present invention, processor 401 is executed such as by the executable program code in run memory 403 Lower operation:

Optionally, the processor 401 is shown in the first recommendation in the browser interface by user terminal to target user Before appearance, it is also used to:

Obtain the characteristic information of target user；

Learn DQN network using the first deeply and the characteristic information determines the second recommendation；

Third recommendation is determined using the 2nd DQN network and the characteristic information；

The first recommendation of the target user is determined according to second recommendation and the third recommendation.

Optionally, the processor 401 is according to the feedback information, the characteristic information of the target user and described first The concrete mode that the characteristic information of recommendation is adjusted the quantity for the recommendation message that first recommendation includes are as follows:

Obtain the corresponding contextual information of the feedback information；

Optionally, the processor 401 obtains the target user in navigation process for first recommendation Feedback information concrete mode are as follows:

Optionally, the processor 401, is also used to:

According to first feedback information and second feedback information determine the first DQN network recommendation effect and The recommendation effect of the 2nd DQN network；

In the case where the recommendation effect of the 2nd DQN network is better than the recommendation effect of the first DQN network, according to The parameter of the 2nd DQN network is adjusted the parameter of the first DQN network.

Optionally, the processor 401 determines the mesh according to second recommendation and the third recommendation Mark the concrete mode of the first recommendation of user are as follows:

In the specific implementation, processor 401 described in the embodiment of the present invention, network interface 402 and memory 403 can be held It is real that the present invention also can be performed in implementation described in a kind of row process of content recommendation method provided in an embodiment of the present invention Implementation described in a kind of content recommendation device of example offer is applied, details are not described herein.

In the embodiment of the present invention, processor 401 shows that first recommends to target user by the browser interface of user terminal Content obtains the feedback information that target user is directed to the first recommendation in navigation process, and can be according to feedback information, mesh The quantity for the recommendation message that the characteristic information of the characteristic information and the first recommendation of marking user includes to the first recommendation into Row adjustment, it is dynamically and accurately adaptive so as to the foundation factors such as user characteristics, recommendation feature, Real-time Feedback The quantity of the adjustment recommendation recommendation message that includes can be with so that different user be made to have the quantity of different recommendation messages The income for promoting platform is maximized in the case where not influencing user experience.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Above disclosed is only a preferred embodiment of the present invention, cannot limit the power of the present invention with this certainly Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and weighs according to the present invention Benefit requires made equivalent variations, still belongs to the scope covered by the invention.

Claims

1. a kind of content recommendation method, which is characterized in that the described method includes:

Show that the first recommendation, first recommendation include at least to target user by the browser interface of user terminal One recommendation message；

According to the characteristic information of the feedback information, the characteristic information of the target user and first recommendation to described The quantity for the recommendation message that first recommendation includes is adjusted.

2. the method according to claim 1, wherein the browser interface by user terminal is to target user Before showing the first recommendation, the method also includes:

Obtain the characteristic information of target user；

3. according to the method described in claim 2, it is characterized in that, described according to the feedback information, the target user The quantity for the recommendation message that the characteristic information of characteristic information and first recommendation includes to first recommendation into Row adjustment, comprising:

Obtain the corresponding contextual information of the feedback information；

By the feedback information, the characteristic information of the target user, first recommendation characteristic information and it is described on Context information inputs the first DQN network and the 2nd DQN network respectively, to obtain the prediction of first recommendation Reward value；

4. according to the method in claim 2 or 3, which is characterized in that described to obtain the target user in navigation process For the feedback information of first recommendation, comprising:

The target user is obtained to push away in first recommendation from second recommendation in navigation process The first feedback information of message is recommended, and to the recommendation message from the third recommendation in first recommendation Second feedback information；

It is directed in navigation process using first feedback information and second feedback information as the target user described The feedback information of first recommendation.

5. according to the method described in claim 4, it is characterized in that, the method also includes:

The recommendation effect of the first DQN network and described is determined according to first feedback information and second feedback information The recommendation effect of 2nd DQN network；

In the case where the recommendation effect of the 2nd DQN network is better than the recommendation effect of the first DQN network, according to described The parameter of 2nd DQN network is adjusted the parameter of the first DQN network.

6. the method according to any one of claim 2~5, which is characterized in that described according to second recommendation The first recommendation of the target user is determined with the third recommendation, comprising:

7. method described according to claim 1~any one of 6, which is characterized in that the feedback information include number of clicks, Access depth, access one of duration and refreshing frequency or a variety of.

8. a kind of content recommendation device characterized by comprising

Display module shows the first recommendation to target user for the browser interface by user terminal, and described first pushes away Recommending content includes at least one recommendation message；

Module is obtained, for obtaining the target user in navigation process for the feedback information of first recommendation；

Processing module, for according to the characteristic information of the feedback information, the target user and first recommendation Characteristic information is adjusted the quantity for the recommendation message that first recommendation includes.

9. a kind of server, which is characterized in that including processor, network interface and storage device, the processor, the network Interface and the storage device are connected with each other, wherein the network interface is used for sending and receiving data, institute by the control of the processor Storage device is stated for storing computer program, the computer program includes program instruction, and the processor is configured for Described program instruction is called, such as content recommendation method according to any one of claims 1 to 7 is executed.

10. a kind of computer storage medium, which is characterized in that program instruction is stored in the computer storage medium, it is described Program instruction is performed, for executing such as content recommendation method according to any one of claims 1 to 7.