CN111898019B

CN111898019B - Information pushing method and device

Info

Publication number: CN111898019B
Application number: CN201910373172.6A
Authority: CN
Inventors: 姜飞; 卞俊杰; 王天驹; 叶璨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-05-06
Filing date: 2019-05-06
Publication date: 2024-04-16
Anticipated expiration: 2039-05-06
Also published as: CN111898019A

Abstract

The disclosure discloses an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium, wherein the information pushing method comprises the following steps: when the generated information is detected, characteristic data corresponding to the information are collected, a current state is generated according to the characteristic data, and a first current action is generated according to the push information; inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing information according to the current state and the first current action; and if the corresponding score is greater than the preset score when the information is pushed, pushing the information. According to the method and the device, the current state is generated according to the characteristic data through collecting the characteristic data corresponding to the generated information, the first current action is generated according to the pushing information, the current state and the first current action are input into the on-line deep reinforcement learning model, the corresponding score when the information is pushed is obtained, if the corresponding score when the information is pushed is larger than the preset score, the information is pushed, the pushing accuracy rate can be improved, and therefore the clicking rate of a user on the information is improved.

Description

Information pushing method and device

Technical Field

The disclosure relates to the technical field of information processing, and in particular relates to an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium.

Background

With the rapid development of internet technology, people increasingly use personalized push applications (apps) to acquire information, including news, video, and the like. These apps can push customized information to the user actively when the user is not using the App through an information push mechanism in addition to pushing information of interest to the user when the user is using the App.

In the related art, an information pushing scheme based on candidate set content triggering and frequency control rules is generally adopted, specifically, for each user, there is a content candidate set associated with the user, such as popular video, video uploaded by an author concerned by the user, and the like. When new content is generated within the candidate set, a push decision is triggered. Wherein the push decision is mainly based on the sending frequency, for example, the interval between two pieces of information push notification can not be less than a certain time.

Although the above scheme can realize individuation of information push to a certain extent, the behavior mode of the user is a very complex process, and each piece of information push can influence the watching state and the subsequent behaviors of the user. For example, if a user clicks on a piece of push information of an app, the push information is not of interest, thereby creating a tiredness to the app, and it is likely that clicking on a subsequent piece of better push will not continue, thus triggering directly from the candidate set, lacking personalization.

Disclosure of Invention

The disclosure provides an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium, which can improve the pushing accuracy and thus improve the click rate of a user on information.

According to a first aspect of an embodiment of the present disclosure, there is provided an information pushing method, including:

when the generation of information is detected, collecting characteristic data corresponding to the information;

generating a current state according to the characteristic data, and generating a first current action according to pushing the information;

inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;

and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.

Further, the method further comprises:

generating a second current action according to the information which is not pushed;

and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.

Further, the method further comprises:

collecting user feedback conditions after pushing the information;

and correlating the user feedback condition with the characteristic data, and updating the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.

Further, the inputting the current state and the first current action into the online deep learning model to enable the online deep learning model to obtain the corresponding score when pushing the information according to the current state and the first current action includes:

inputting the current state and the first current action into a loss function

Or->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action;

and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.

Further, the method further comprises:

counting historical recommendation made by the online deep reinforcement learning model on information generated by the history in a preset time period, wherein the historical recommendation comprises pushing or not pushing;

setting the value of the feedback discount according to the historical recommendation.

Further, the feature data includes at least one of an association feature, a context feature, a content feature, and a user feature between the information and the user.

Further, the associated feature includes at least one feature of a click amount, a viewing amount, a praise amount, and a gift amount of the work of the author.

Further, the contextual characteristics include at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.

According to a second aspect of the embodiments of the present disclosure, there is provided an information pushing apparatus, including:

the device comprises a characteristic acquisition module, a characteristic analysis module and a characteristic analysis module, wherein the characteristic acquisition module is configured to acquire characteristic data corresponding to information when the information is detected to be generated;

the state action generating module is configured to generate a current state according to the characteristic data and generate a first current action according to pushing the information;

the score obtaining module is configured to input the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action;

and the pushing module is configured to push the information if the corresponding score is greater than a preset score when the information is pushed.

Further, the state action generation module is further configured to: generating a second current action according to the information which is not pushed;

accordingly, the score acquisition module is further configured to: and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.

Further, the device further comprises:

the feedback collection module is configured to collect user feedback conditions after the information is pushed;

and the model updating module is configured to correlate the user feedback condition with the characteristic data, and update the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.

Further, the score acquisition module is specifically configured to: inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.

Further, the device further comprises:

a feedback discount determination module configured to count historical recommendations made by the online deep reinforcement learning model to historically generated information over a preset period of time, the historical recommendations including push or no push; setting the value of the feedback discount according to the historical recommendation.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing processor-executable instructions; wherein the processor is configured to:

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising:

According to a fifth aspect of embodiments of the present disclosure, there is provided an application program, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising:

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the method comprises the steps of generating a current state according to characteristic data corresponding to generated information by collecting the characteristic data, generating a first current action according to pushing information, inputting the current state and the first current action into an on-line deep reinforcement learning model, so that the on-line deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action, pushing the information if the corresponding scores when pushing the information are larger than preset scores, and improving pushing accuracy, thereby improving the clicking rate of users on the information.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of an information pushing method according to an embodiment of the present disclosure.

Fig. 2 is a flowchart of an information pushing method according to a second embodiment of the present disclosure.

Fig. 3 is a flowchart of an information pushing method according to a third embodiment of the present disclosure.

Fig. 4 is a block diagram of an information pushing device according to a fourth embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device according to a fifth embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Example 1

Fig. 1 is a flowchart of an information pushing method according to an embodiment of the present disclosure, where an execution subject of the information pushing method according to the embodiment may be an information pushing device according to an embodiment of the present disclosure, and the device may be integrated in a mobile terminal device (for example, a smart phone, a tablet computer, etc.), a notebook computer, or a fixed terminal (desktop computer), and the information pushing device may be implemented by using hardware or software. As shown in fig. 1, the method comprises the following steps:

in step S11, when the generation of information is detected, feature data corresponding to the information is collected.

In this embodiment, the information generation is used as a push trigger condition, that is, when the information generation is detected, the corresponding feature data is collected. The information may be news, video, novels, etc., and is not particularly limited herein.

Wherein the feature data includes, but is not limited to, at least one of an associated feature, a contextual feature, a content feature, and a user feature between the information and the user.

Wherein the associated features include, but are not limited to, at least one of click volume, view volume, praise volume, and gift volume of the work of the author.

Wherein the contextual characteristics include, but are not limited to, at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.

Wherein the user characteristics include, but are not limited to, at least one of personal information of the user, active time distribution, number of push triggers recently per hour.

Wherein the content features include, but are not limited to, at least one feature of author information, video tags, video statistics.

In addition to the click rate, the feature data example of the implementation also considers finer indexes such as context features and interaction features, and can further improve the pushing benefit by being used as the current state input deep reinforcement learning model.

In step S12, a current state is generated from the feature data and a first current action is generated from the information.

In this context, in order to distinguish between different current actions, a first occurring current action is referred to herein as a first current action, and a subsequent occurring current action is referred to herein as a second current action.

In step S13, the current state and the first current action are input into an online deep learning model, so that the online deep learning model obtains a score corresponding to push information according to the current state and the first current action.

The current state and the first current action are both known data, and the online deep reinforcement learning model can obtain corresponding scores when pushing information according to the known data.

The on-line deep reinforcement learning model is a model obtained by training based on the combination of deep learning and reinforcement learning, can be continuously updated according to feature data and feedback of a user on a decision result, and is used as a new on-line deep reinforcement learning model for deciding information generated subsequently.

Specifically, the on-line deep reinforcement learning model can be obtained by the following method: firstly, characteristic data of historical information is collected, wherein the characteristic data comprise decision results and user feedback conditions generated by known other recommendation strategy decisions, and an initial deep reinforcement learning model is obtained through training according to the characteristic data and the user feedback conditions to be online. And then, carrying out continuous iterative updating on the initial deep reinforcement learning model according to the decision result and the user feedback condition of the initial deep reinforcement learning model after the epsilon greedy exploration strategy is online and the characteristic data of the information, so as to obtain a better online deep reinforcement learning model.

In step S14, if the score corresponding to the pushing of the information is greater than a preset score, the information is pushed.

According to the embodiment, the current state is generated according to the characteristic data by collecting the characteristic data corresponding to the generated information, the first current action is generated according to the pushing information, the current state and the first current action are input into the on-line deep reinforcement learning model, so that the on-line deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action, if the corresponding score when pushing the information is larger than the preset score, the information is pushed, the pushing accuracy can be improved, and the clicking rate of a user on the information is improved.

In an alternative embodiment, step S13 includes:

inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () is represented as the upper deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value as a corresponding score when the information is pushed.

Wherein a value of 1 may represent push. For a piece of information, if the action is push, then the user's feedback (e.g., the user's click behavior) is reflected into the next state. Specifically, after a piece of information is pushed to the user, the corresponding context feature is updated, that is, the clicking action of the user is recorded as the next push judgment feature data.

Including positive feedback and negative feedback. The positive feedback comprises clicking on information, opening an application program within a preset time period (for example, within one minute) after pushing, and the negative feedback is the pushing cost, and considering that the pushing proportion of the inactive user can be greatly reduced by simply optimizing the clicking amount, the clicking feedback of the inactive user is larger than that of the active user when the feedback is set. Alternatively, for different types of content, such as live broadcast, video upload, etc., the click feedback given during training is different due to the different actual benefits of clicking.

Further, the method further comprises a setting method of feedback discount, and specifically comprises the following steps:

counting historical recommendation made by the deep reinforcement learning model for the historical push information in a preset time period, wherein the historical recommendation comprises push or non-push;

the value of the feedback discount is set according to the historical recommendation.

The preset time period may be the same day. Specifically, a decision sequence may be formed by decisions of all information trigger points in the day, and the value of the feedback discount is set according to the decision sequence. The feedback discount may be set to 0.9, for example.

Example two

Fig. 2 is a flowchart of an information pushing method provided in a second embodiment of the present disclosure, where the embodiment further includes a step of updating an online deep reinforcement learning model, as shown in fig. 2, based on the foregoing embodiment, including the following steps:

in step S21, when the generation information is detected, the feature data corresponding to the information is acquired.

In step S22, a current state is generated according to the feature data, and a first current action is generated according to pushing the information.

In step S23, the current state and the first current action are input into an online deep learning model, so that the corresponding score is obtained according to the online deep learning model according to the current state and the first current action when pushing information is obtained.

In step S24, if the score corresponding to the pushing of the information is greater than the preset score, the information is pushed.

In step S25, the user feedback after pushing the information is collected.

In step S26, the user feedback condition and the feature data are correlated, and the correlated user feedback condition and feature data are used as training data to update the online deep reinforcement learning model.

According to the embodiment, the user feedback condition and the feature data are associated, the associated user feedback condition and feature data are used as training data, the online deep reinforcement learning model is updated, information can be pushed to the user in a targeted mode, the pushed information is enabled to be more in line with the interest features of the user, and therefore the click rate of the user on the information is further improved.

Example III

Fig. 3 is a flowchart of an information pushing method according to a third embodiment of the present disclosure, where, based on the foregoing embodiment, a preset score is optimized to a score corresponding to a time when no information is pushed, as shown in fig. 3, the method includes the following steps:

in step S31, when the generation information is detected, the feature data corresponding to the information is acquired.

In step S32, a current state is generated according to the feature data, a first current action is generated according to pushing the information, and a second current action is generated according to not pushing the information.

In step S33, the current state, the first current action, and the second current action are input into the online deep learning model, so that the online deep learning model obtains the corresponding score when pushing the information according to the current state and the first current action, and obtains the corresponding score when not pushing the information according to the current state and the second current action.

The score corresponding to pushing the information is obtained, which can be seen from the first embodiment. Similarly, the corresponding score is obtained when the information is not pushed, specifically, the current state and the first current action are input into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed. Wherein, if the output value is 0, the push is not performed.

In step S34, if the score corresponding to pushing the information is greater than the score corresponding to not pushing the information, the information is pushed.

According to the embodiment, the pushing and non-pushing are used as the current actions, the current state and the current actions are input into the on-line neural network model, whether the information is pushed or not is determined according to the corresponding score when pushing and the corresponding score when not pushing, and therefore the pushing accuracy rate can be improved, and the clicking rate of a user on the information is improved.

Example IV

Fig. 4 is a block diagram of an information pushing device according to an embodiment of the present disclosure. The device can be integrated in a mobile terminal device (e.g. a smart phone, a tablet computer, etc.), a notebook or a fixed terminal (desktop computer), and the information pushing device can be implemented in hardware or software. Referring to fig. 4, the apparatus includes: the device comprises a characteristic acquisition module 41, a state action generation module 42, a score acquisition module 43 and a pushing module 44, wherein the state action generation module 42, the score acquisition module 43 and the pushing module 44 are sequentially connected.

The feature collection module 41 is configured to collect feature data corresponding to the information when the generation information is detected.

The state action generation module 42 is configured to generate a current state from the feature data and a first current action from pushing the information;

the score obtaining module 43 is configured to input the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action;

the pushing module 44 is configured to push the information if the corresponding score is greater than a preset score when the information is pushed.

Further, the state action generation module 42 is further configured to: generating a second current action according to the information which is not pushed;

accordingly, the score acquisition module 43 is further configured to: and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.

Further, the device further comprises: a feedback gathering module 45 and a model updating module 46;

the feedback collection module 45 is configured to collect user feedback after pushing the information;

the model updating module 46 is configured to correlate the user feedback situation with the feature data and update the online deep reinforcement learning model using the correlated user feedback situation and feature data as training data.

Further, the score obtaining module 43 is specifically configured to: inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.

The feedback discount determination module 47 is configured to count historical recommendations made by the online deep reinforcement learning model for historically generated information over a preset period of time, the historical recommendations including push or no push; setting the value of the feedback discount according to the historical recommendation.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Example five

An embodiment of the present disclosure provides an electronic device, including:

a processor;

Wherein fig. 5 is a block diagram of an electronic device provided by an embodiment of the disclosure. For example, the electronic device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the electronic device may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interactions between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like. The memory 504 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 506 provides power to the various components of the electronic device. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices.

The multimedia component 508 includes a screen between the electronic device and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. When the electronic device is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 504 or transmitted via the communication component 516. In some embodiments, the audio component 510 further comprises a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 514 includes one or more sensors for providing status assessment of various aspects of the electronic device. For example, the sensor assembly 514 may detect an on/off state of the electronic device, a relative positioning of the components, such as a display and keypad of the electronic device, the sensor assembly 514 may also detect a change in position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, an orientation or acceleration/deceleration of the electronic device, and a change in temperature of the electronic device. The sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the electronic device and other devices, either wired or wireless. The electronic device may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 516 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 504, including instructions executable by processor 520 of the electronic device to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, an application program, such as memory 504 including instructions executable by processor 520 of the electronic device to perform the above-described method is also provided.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An information pushing method is characterized by comprising the following steps:

when the generated information is detected, collecting feature data corresponding to the information, wherein the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of last information pushing, information quantity pushed on the same day and click condition of information quantity pushed on the same day;

2. The information pushing method according to claim 1, characterized in that the method further comprises:

3. The information pushing method according to claim 1, characterized in that the method further comprises:

collecting user feedback conditions after pushing the information;

4. The method for pushing information according to claim 1, wherein inputting the current state and the first current action into an online deep learning model so that the online deep learning model obtains a corresponding score when the information is pushed according to the current state and the first current action, includes:

inputting the current state and the first current action into a loss function

5. The information pushing method according to claim 4, further comprising:

6. The information pushing method according to any one of claims 1 to 5, wherein the feature data includes at least one of an association feature, a content feature, and a user feature between the information and the user.

7. The information pushing method of claim 6, wherein the associated features include at least one of a click amount, a view amount, a praise amount, and a gift amount of the work of the author.

8. An information pushing apparatus, characterized by comprising:

the device comprises a feature acquisition module, a display module and a display module, wherein the feature acquisition module is configured to acquire feature data corresponding to information when the information generation is detected, the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of pushed information quantity on the same day and pushed information quantity on the same day;

the state action generating module is used for generating a current state according to the characteristic data and generating a first current action according to pushing the information;

9. The information pushing device of claim 8, wherein the state action generation module is further configured to: generating a second current action according to the information which is not pushed;

10. The information pushing device of claim 8, wherein the device further comprises:

11. The information pushing device of claim 8, wherein the score acquisition module is specifically configured to:

inputting the current state and the first current action into a loss function:

or (b)In (a) and (b);

wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ ^- And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.

12. The information pushing device of claim 11, wherein the device further comprises:

13. The information pushing apparatus according to any one of claims 8 to 12, wherein the feature data includes at least one of an association feature, a content feature, and a user feature between the information and the user.

14. The information pushing device of claim 13, wherein the associated characteristic comprises at least one of a click volume, a view volume, a praise volume, and a gift volume of the work of the author.

15. An electronic device, comprising:

a processor;

16. A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising: