CN111898019B - Information pushing method and device - Google Patents

Information pushing method and device Download PDF

Info

Publication number
CN111898019B
CN111898019B CN201910373172.6A CN201910373172A CN111898019B CN 111898019 B CN111898019 B CN 111898019B CN 201910373172 A CN201910373172 A CN 201910373172A CN 111898019 B CN111898019 B CN 111898019B
Authority
CN
China
Prior art keywords
information
pushing
pushed
current
current state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910373172.6A
Other languages
Chinese (zh)
Other versions
CN111898019A (en
Inventor
姜飞
卞俊杰
王天驹
叶璨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910373172.6A priority Critical patent/CN111898019B/en
Publication of CN111898019A publication Critical patent/CN111898019A/en
Application granted granted Critical
Publication of CN111898019B publication Critical patent/CN111898019B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure discloses an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium, wherein the information pushing method comprises the following steps: when the generated information is detected, characteristic data corresponding to the information are collected, a current state is generated according to the characteristic data, and a first current action is generated according to the push information; inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing information according to the current state and the first current action; and if the corresponding score is greater than the preset score when the information is pushed, pushing the information. According to the method and the device, the current state is generated according to the characteristic data through collecting the characteristic data corresponding to the generated information, the first current action is generated according to the pushing information, the current state and the first current action are input into the on-line deep reinforcement learning model, the corresponding score when the information is pushed is obtained, if the corresponding score when the information is pushed is larger than the preset score, the information is pushed, the pushing accuracy rate can be improved, and therefore the clicking rate of a user on the information is improved.

Description

Information pushing method and device
Technical Field
The disclosure relates to the technical field of information processing, and in particular relates to an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium.
Background
With the rapid development of internet technology, people increasingly use personalized push applications (apps) to acquire information, including news, video, and the like. These apps can push customized information to the user actively when the user is not using the App through an information push mechanism in addition to pushing information of interest to the user when the user is using the App.
In the related art, an information pushing scheme based on candidate set content triggering and frequency control rules is generally adopted, specifically, for each user, there is a content candidate set associated with the user, such as popular video, video uploaded by an author concerned by the user, and the like. When new content is generated within the candidate set, a push decision is triggered. Wherein the push decision is mainly based on the sending frequency, for example, the interval between two pieces of information push notification can not be less than a certain time.
Although the above scheme can realize individuation of information push to a certain extent, the behavior mode of the user is a very complex process, and each piece of information push can influence the watching state and the subsequent behaviors of the user. For example, if a user clicks on a piece of push information of an app, the push information is not of interest, thereby creating a tiredness to the app, and it is likely that clicking on a subsequent piece of better push will not continue, thus triggering directly from the candidate set, lacking personalization.
Disclosure of Invention
The disclosure provides an information pushing method, an information pushing device, electronic equipment and a computer readable storage medium, which can improve the pushing accuracy and thus improve the click rate of a user on information.
According to a first aspect of an embodiment of the present disclosure, there is provided an information pushing method, including:
when the generation of information is detected, collecting characteristic data corresponding to the information;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
Further, the method further comprises:
generating a second current action according to the information which is not pushed;
and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.
Further, the method further comprises:
collecting user feedback conditions after pushing the information;
and correlating the user feedback condition with the characteristic data, and updating the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.
Further, the inputting the current state and the first current action into the online deep learning model to enable the online deep learning model to obtain the corresponding score when pushing the information according to the current state and the first current action includes:
inputting the current state and the first current action into a loss function
Or->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action;
and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.
Further, the method further comprises:
counting historical recommendation made by the online deep reinforcement learning model on information generated by the history in a preset time period, wherein the historical recommendation comprises pushing or not pushing;
setting the value of the feedback discount according to the historical recommendation.
Further, the feature data includes at least one of an association feature, a context feature, a content feature, and a user feature between the information and the user.
Further, the associated feature includes at least one feature of a click amount, a viewing amount, a praise amount, and a gift amount of the work of the author.
Further, the contextual characteristics include at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.
According to a second aspect of the embodiments of the present disclosure, there is provided an information pushing apparatus, including:
the device comprises a characteristic acquisition module, a characteristic analysis module and a characteristic analysis module, wherein the characteristic acquisition module is configured to acquire characteristic data corresponding to information when the information is detected to be generated;
the state action generating module is configured to generate a current state according to the characteristic data and generate a first current action according to pushing the information;
the score obtaining module is configured to input the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action;
and the pushing module is configured to push the information if the corresponding score is greater than a preset score when the information is pushed.
Further, the state action generation module is further configured to: generating a second current action according to the information which is not pushed;
accordingly, the score acquisition module is further configured to: and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.
Further, the device further comprises:
the feedback collection module is configured to collect user feedback conditions after the information is pushed;
and the model updating module is configured to correlate the user feedback condition with the characteristic data, and update the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.
Further, the score acquisition module is specifically configured to: inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.
Further, the device further comprises:
a feedback discount determination module configured to count historical recommendations made by the online deep reinforcement learning model to historically generated information over a preset period of time, the historical recommendations including push or no push; setting the value of the feedback discount according to the historical recommendation.
Further, the feature data includes at least one of an association feature, a context feature, a content feature, and a user feature between the information and the user.
Further, the associated feature includes at least one feature of a click amount, a viewing amount, a praise amount, and a gift amount of the work of the author.
Further, the contextual characteristics include at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a processor;
a memory for storing processor-executable instructions; wherein the processor is configured to:
when the generation of information is detected, collecting characteristic data corresponding to the information;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising:
when the generation of information is detected, collecting characteristic data corresponding to the information;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
According to a fifth aspect of embodiments of the present disclosure, there is provided an application program, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising:
when the generation of information is detected, collecting characteristic data corresponding to the information;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the method comprises the steps of generating a current state according to characteristic data corresponding to generated information by collecting the characteristic data, generating a first current action according to pushing information, inputting the current state and the first current action into an on-line deep reinforcement learning model, so that the on-line deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action, pushing the information if the corresponding scores when pushing the information are larger than preset scores, and improving pushing accuracy, thereby improving the clicking rate of users on the information.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart of an information pushing method according to an embodiment of the present disclosure.
Fig. 2 is a flowchart of an information pushing method according to a second embodiment of the present disclosure.
Fig. 3 is a flowchart of an information pushing method according to a third embodiment of the present disclosure.
Fig. 4 is a block diagram of an information pushing device according to a fourth embodiment of the present disclosure.
Fig. 5 is a block diagram of an electronic device according to a fifth embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Example 1
Fig. 1 is a flowchart of an information pushing method according to an embodiment of the present disclosure, where an execution subject of the information pushing method according to the embodiment may be an information pushing device according to an embodiment of the present disclosure, and the device may be integrated in a mobile terminal device (for example, a smart phone, a tablet computer, etc.), a notebook computer, or a fixed terminal (desktop computer), and the information pushing device may be implemented by using hardware or software. As shown in fig. 1, the method comprises the following steps:
in step S11, when the generation of information is detected, feature data corresponding to the information is collected.
In this embodiment, the information generation is used as a push trigger condition, that is, when the information generation is detected, the corresponding feature data is collected. The information may be news, video, novels, etc., and is not particularly limited herein.
Wherein the feature data includes, but is not limited to, at least one of an associated feature, a contextual feature, a content feature, and a user feature between the information and the user.
Wherein the associated features include, but are not limited to, at least one of click volume, view volume, praise volume, and gift volume of the work of the author.
Wherein the contextual characteristics include, but are not limited to, at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.
Wherein the user characteristics include, but are not limited to, at least one of personal information of the user, active time distribution, number of push triggers recently per hour.
Wherein the content features include, but are not limited to, at least one feature of author information, video tags, video statistics.
In addition to the click rate, the feature data example of the implementation also considers finer indexes such as context features and interaction features, and can further improve the pushing benefit by being used as the current state input deep reinforcement learning model.
In step S12, a current state is generated from the feature data and a first current action is generated from the information.
In this context, in order to distinguish between different current actions, a first occurring current action is referred to herein as a first current action, and a subsequent occurring current action is referred to herein as a second current action.
In step S13, the current state and the first current action are input into an online deep learning model, so that the online deep learning model obtains a score corresponding to push information according to the current state and the first current action.
The current state and the first current action are both known data, and the online deep reinforcement learning model can obtain corresponding scores when pushing information according to the known data.
The on-line deep reinforcement learning model is a model obtained by training based on the combination of deep learning and reinforcement learning, can be continuously updated according to feature data and feedback of a user on a decision result, and is used as a new on-line deep reinforcement learning model for deciding information generated subsequently.
Specifically, the on-line deep reinforcement learning model can be obtained by the following method: firstly, characteristic data of historical information is collected, wherein the characteristic data comprise decision results and user feedback conditions generated by known other recommendation strategy decisions, and an initial deep reinforcement learning model is obtained through training according to the characteristic data and the user feedback conditions to be online. And then, carrying out continuous iterative updating on the initial deep reinforcement learning model according to the decision result and the user feedback condition of the initial deep reinforcement learning model after the epsilon greedy exploration strategy is online and the characteristic data of the information, so as to obtain a better online deep reinforcement learning model.
In step S14, if the score corresponding to the pushing of the information is greater than a preset score, the information is pushed.
According to the embodiment, the current state is generated according to the characteristic data by collecting the characteristic data corresponding to the generated information, the first current action is generated according to the pushing information, the current state and the first current action are input into the on-line deep reinforcement learning model, so that the on-line deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action, if the corresponding score when pushing the information is larger than the preset score, the information is pushed, the pushing accuracy can be improved, and the clicking rate of a user on the information is improved.
In an alternative embodiment, step S13 includes:
inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () is represented as the upper deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value as a corresponding score when the information is pushed.
Wherein a value of 1 may represent push. For a piece of information, if the action is push, then the user's feedback (e.g., the user's click behavior) is reflected into the next state. Specifically, after a piece of information is pushed to the user, the corresponding context feature is updated, that is, the clicking action of the user is recorded as the next push judgment feature data.
Including positive feedback and negative feedback. The positive feedback comprises clicking on information, opening an application program within a preset time period (for example, within one minute) after pushing, and the negative feedback is the pushing cost, and considering that the pushing proportion of the inactive user can be greatly reduced by simply optimizing the clicking amount, the clicking feedback of the inactive user is larger than that of the active user when the feedback is set. Alternatively, for different types of content, such as live broadcast, video upload, etc., the click feedback given during training is different due to the different actual benefits of clicking.
Further, the method further comprises a setting method of feedback discount, and specifically comprises the following steps:
counting historical recommendation made by the deep reinforcement learning model for the historical push information in a preset time period, wherein the historical recommendation comprises push or non-push;
the value of the feedback discount is set according to the historical recommendation.
The preset time period may be the same day. Specifically, a decision sequence may be formed by decisions of all information trigger points in the day, and the value of the feedback discount is set according to the decision sequence. The feedback discount may be set to 0.9, for example.
Example two
Fig. 2 is a flowchart of an information pushing method provided in a second embodiment of the present disclosure, where the embodiment further includes a step of updating an online deep reinforcement learning model, as shown in fig. 2, based on the foregoing embodiment, including the following steps:
in step S21, when the generation information is detected, the feature data corresponding to the information is acquired.
In step S22, a current state is generated according to the feature data, and a first current action is generated according to pushing the information.
In step S23, the current state and the first current action are input into an online deep learning model, so that the corresponding score is obtained according to the online deep learning model according to the current state and the first current action when pushing information is obtained.
In step S24, if the score corresponding to the pushing of the information is greater than the preset score, the information is pushed.
In step S25, the user feedback after pushing the information is collected.
In step S26, the user feedback condition and the feature data are correlated, and the correlated user feedback condition and feature data are used as training data to update the online deep reinforcement learning model.
According to the embodiment, the user feedback condition and the feature data are associated, the associated user feedback condition and feature data are used as training data, the online deep reinforcement learning model is updated, information can be pushed to the user in a targeted mode, the pushed information is enabled to be more in line with the interest features of the user, and therefore the click rate of the user on the information is further improved.
Example III
Fig. 3 is a flowchart of an information pushing method according to a third embodiment of the present disclosure, where, based on the foregoing embodiment, a preset score is optimized to a score corresponding to a time when no information is pushed, as shown in fig. 3, the method includes the following steps:
in step S31, when the generation information is detected, the feature data corresponding to the information is acquired.
In step S32, a current state is generated according to the feature data, a first current action is generated according to pushing the information, and a second current action is generated according to not pushing the information.
In step S33, the current state, the first current action, and the second current action are input into the online deep learning model, so that the online deep learning model obtains the corresponding score when pushing the information according to the current state and the first current action, and obtains the corresponding score when not pushing the information according to the current state and the second current action.
The score corresponding to pushing the information is obtained, which can be seen from the first embodiment. Similarly, the corresponding score is obtained when the information is not pushed, specifically, the current state and the first current action are input into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed. Wherein, if the output value is 0, the push is not performed.
In step S34, if the score corresponding to pushing the information is greater than the score corresponding to not pushing the information, the information is pushed.
According to the embodiment, the pushing and non-pushing are used as the current actions, the current state and the current actions are input into the on-line neural network model, whether the information is pushed or not is determined according to the corresponding score when pushing and the corresponding score when not pushing, and therefore the pushing accuracy rate can be improved, and the clicking rate of a user on the information is improved.
Example IV
Fig. 4 is a block diagram of an information pushing device according to an embodiment of the present disclosure. The device can be integrated in a mobile terminal device (e.g. a smart phone, a tablet computer, etc.), a notebook or a fixed terminal (desktop computer), and the information pushing device can be implemented in hardware or software. Referring to fig. 4, the apparatus includes: the device comprises a characteristic acquisition module 41, a state action generation module 42, a score acquisition module 43 and a pushing module 44, wherein the state action generation module 42, the score acquisition module 43 and the pushing module 44 are sequentially connected.
The feature collection module 41 is configured to collect feature data corresponding to the information when the generation information is detected.
The state action generation module 42 is configured to generate a current state from the feature data and a first current action from pushing the information;
the score obtaining module 43 is configured to input the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action;
the pushing module 44 is configured to push the information if the corresponding score is greater than a preset score when the information is pushed.
Further, the state action generation module 42 is further configured to: generating a second current action according to the information which is not pushed;
accordingly, the score acquisition module 43 is further configured to: and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.
Further, the device further comprises: a feedback gathering module 45 and a model updating module 46;
the feedback collection module 45 is configured to collect user feedback after pushing the information;
the model updating module 46 is configured to correlate the user feedback situation with the feature data and update the online deep reinforcement learning model using the correlated user feedback situation and feature data as training data.
Further, the score obtaining module 43 is specifically configured to: inputting the current state and the first current action into a loss functionOr->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.
The feedback discount determination module 47 is configured to count historical recommendations made by the online deep reinforcement learning model for historically generated information over a preset period of time, the historical recommendations including push or no push; setting the value of the feedback discount according to the historical recommendation.
Further, the feature data includes at least one of an association feature, a context feature, a content feature, and a user feature between the information and the user.
Further, the associated feature includes at least one feature of a click amount, a viewing amount, a praise amount, and a gift amount of the work of the author.
Further, the contextual characteristics include at least one of a current time, a time of last pushed information, a click condition of last pushed information, and a click condition of an amount of information already pushed on the same day and an amount of information already pushed on the same day.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Example five
An embodiment of the present disclosure provides an electronic device, including:
a processor;
a memory for storing processor-executable instructions; wherein the processor is configured to:
when the generation of information is detected, collecting characteristic data corresponding to the information;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
Wherein fig. 5 is a block diagram of an electronic device provided by an embodiment of the disclosure. For example, the electronic device may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 5, the electronic device may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.
The processing component 502 generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 502 may include one or more processors 520 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interactions between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and the like. The memory 504 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 506 provides power to the various components of the electronic device. The power components 506 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices.
The multimedia component 508 includes a screen between the electronic device and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front-facing camera and/or a rear-facing camera. When the electronic device is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 510 is configured to output and/or input audio signals. For example, the audio component 510 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 504 or transmitted via the communication component 516. In some embodiments, the audio component 510 further comprises a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 514 includes one or more sensors for providing status assessment of various aspects of the electronic device. For example, the sensor assembly 514 may detect an on/off state of the electronic device, a relative positioning of the components, such as a display and keypad of the electronic device, the sensor assembly 514 may also detect a change in position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, an orientation or acceleration/deceleration of the electronic device, and a change in temperature of the electronic device. The sensor assembly 514 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate communication between the electronic device and other devices, either wired or wireless. The electronic device may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 516 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 504, including instructions executable by processor 520 of the electronic device to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, an application program, such as memory 504 including instructions executable by processor 520 of the electronic device to perform the above-described method is also provided.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (16)

1. An information pushing method is characterized by comprising the following steps:
when the generated information is detected, collecting feature data corresponding to the information, wherein the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of last information pushing, information quantity pushed on the same day and click condition of information quantity pushed on the same day;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
2. The information pushing method according to claim 1, characterized in that the method further comprises:
generating a second current action according to the information which is not pushed;
and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.
3. The information pushing method according to claim 1, characterized in that the method further comprises:
collecting user feedback conditions after pushing the information;
and correlating the user feedback condition with the characteristic data, and updating the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.
4. The method for pushing information according to claim 1, wherein inputting the current state and the first current action into an online deep learning model so that the online deep learning model obtains a corresponding score when the information is pushed according to the current state and the first current action, includes:
inputting the current state and the first current action into a loss function
Or->In (a) and (b); wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action;
and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.
5. The information pushing method according to claim 4, further comprising:
counting historical recommendation made by the online deep reinforcement learning model on information generated by the history in a preset time period, wherein the historical recommendation comprises pushing or not pushing;
setting the value of the feedback discount according to the historical recommendation.
6. The information pushing method according to any one of claims 1 to 5, wherein the feature data includes at least one of an association feature, a content feature, and a user feature between the information and the user.
7. The information pushing method of claim 6, wherein the associated features include at least one of a click amount, a view amount, a praise amount, and a gift amount of the work of the author.
8. An information pushing apparatus, characterized by comprising:
the device comprises a feature acquisition module, a display module and a display module, wherein the feature acquisition module is configured to acquire feature data corresponding to information when the information generation is detected, the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of pushed information quantity on the same day and pushed information quantity on the same day;
the state action generating module is used for generating a current state according to the characteristic data and generating a first current action according to pushing the information;
the score obtaining module is configured to input the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when pushing the information according to the current state and the first current action;
and the pushing module is configured to push the information if the corresponding score is greater than a preset score when the information is pushed.
9. The information pushing device of claim 8, wherein the state action generation module is further configured to: generating a second current action according to the information which is not pushed;
accordingly, the score acquisition module is further configured to: and inputting the current state and the second current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains the corresponding score when the information is not pushed according to the current state and the second current action, and takes the corresponding score when the information is not pushed as the preset score.
10. The information pushing device of claim 8, wherein the device further comprises:
the feedback collection module is configured to collect user feedback conditions after the information is pushed;
and the model updating module is configured to correlate the user feedback condition with the characteristic data, and update the online deep reinforcement learning model by taking the correlated user feedback condition and the correlated characteristic data as training data.
11. The information pushing device of claim 8, wherein the score acquisition module is specifically configured to:
inputting the current state and the first current action into a loss function:
or (b)In (a) and (b);
wherein Σ is the sum, max is the maximum value, Q () represents the on-line deep learning neural network, θ - And θ is a network parameter, D is a feature dataset, r is feedback, γ is a feedback discount, s 'is a previous state, a' is a previous action, s is the current state, a is the first current action; and taking the output value of the online deep learning neural network as a corresponding score when the information is pushed.
12. The information pushing device of claim 11, wherein the device further comprises:
a feedback discount determination module configured to count historical recommendations made by the online deep reinforcement learning model to historically generated information over a preset period of time, the historical recommendations including push or no push; setting the value of the feedback discount according to the historical recommendation.
13. The information pushing apparatus according to any one of claims 8 to 12, wherein the feature data includes at least one of an association feature, a content feature, and a user feature between the information and the user.
14. The information pushing device of claim 13, wherein the associated characteristic comprises at least one of a click volume, a view volume, a praise volume, and a gift volume of the work of the author.
15. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions; wherein the processor is configured to:
when the generated information is detected, collecting feature data corresponding to the information, wherein the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of last information pushing, information quantity pushed on the same day and click condition of information quantity pushed on the same day;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
16. A non-transitory computer readable storage medium, which when executed by a processor of a mobile terminal, causes the mobile terminal to perform an information push method, the method comprising:
when the generated information is detected, collecting feature data corresponding to the information, wherein the feature data comprises click quantity, interaction features and context features, and the context features comprise current time, time of last information pushing, click condition of last information pushing, information quantity pushed on the same day and click condition of information quantity pushed on the same day;
generating a current state according to the characteristic data, and generating a first current action according to pushing the information;
inputting the current state and the first current action into an online deep reinforcement learning model, so that the online deep reinforcement learning model obtains corresponding scores when pushing the information according to the current state and the first current action;
and if the corresponding score is larger than the preset score when the information is pushed, pushing the information.
CN201910373172.6A 2019-05-06 2019-05-06 Information pushing method and device Active CN111898019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910373172.6A CN111898019B (en) 2019-05-06 2019-05-06 Information pushing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910373172.6A CN111898019B (en) 2019-05-06 2019-05-06 Information pushing method and device

Publications (2)

Publication Number Publication Date
CN111898019A CN111898019A (en) 2020-11-06
CN111898019B true CN111898019B (en) 2024-04-16

Family

ID=73169538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910373172.6A Active CN111898019B (en) 2019-05-06 2019-05-06 Information pushing method and device

Country Status (1)

Country Link
CN (1) CN111898019B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269262A (en) * 2021-06-02 2021-08-17 腾讯音乐娱乐科技(深圳)有限公司 Method, apparatus and storage medium for training matching degree detection model

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071251A1 (en) * 2015-10-28 2017-05-04 百度在线网络技术(北京)有限公司 Information pushing method and device
CN106658096A (en) * 2016-11-17 2017-05-10 百度在线网络技术(北京)有限公司 Method and device for pushing live program
CN106845817A (en) * 2017-01-11 2017-06-13 清华大学 Online strengthening learns transaction system and method
CN107463698A (en) * 2017-08-15 2017-12-12 北京百度网讯科技有限公司 Method and apparatus based on artificial intelligence pushed information
CN107784071A (en) * 2017-09-18 2018-03-09 维沃移动通信有限公司 A kind of information push method, device and mobile terminal
CN108446382A (en) * 2018-03-20 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN109003143A (en) * 2018-08-03 2018-12-14 阿里巴巴集团控股有限公司 Recommend using deeply study the method and device of marketing
CN109167816A (en) * 2018-08-03 2019-01-08 广州虎牙信息科技有限公司 Information-pushing method, device, equipment and storage medium
CN109299387A (en) * 2018-11-13 2019-02-01 平安科技(深圳)有限公司 A kind of information push method based on intelligent recommendation, device and terminal device
CN109446431A (en) * 2018-12-10 2019-03-08 网易传媒科技(北京)有限公司 For the method, apparatus of information recommendation, medium and calculate equipment
CN109451038A (en) * 2018-12-06 2019-03-08 北京达佳互联信息技术有限公司 A kind of information-pushing method, device, server and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374138A1 (en) * 2017-06-23 2018-12-27 Vufind Inc. Leveraging delayed and partial reward in deep reinforcement learning artificial intelligence systems to provide purchase recommendations

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071251A1 (en) * 2015-10-28 2017-05-04 百度在线网络技术(北京)有限公司 Information pushing method and device
CN106658096A (en) * 2016-11-17 2017-05-10 百度在线网络技术(北京)有限公司 Method and device for pushing live program
CN106845817A (en) * 2017-01-11 2017-06-13 清华大学 Online strengthening learns transaction system and method
CN107463698A (en) * 2017-08-15 2017-12-12 北京百度网讯科技有限公司 Method and apparatus based on artificial intelligence pushed information
CN107784071A (en) * 2017-09-18 2018-03-09 维沃移动通信有限公司 A kind of information push method, device and mobile terminal
CN108446382A (en) * 2018-03-20 2018-08-24 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN109003143A (en) * 2018-08-03 2018-12-14 阿里巴巴集团控股有限公司 Recommend using deeply study the method and device of marketing
CN109167816A (en) * 2018-08-03 2019-01-08 广州虎牙信息科技有限公司 Information-pushing method, device, equipment and storage medium
CN109299387A (en) * 2018-11-13 2019-02-01 平安科技(深圳)有限公司 A kind of information push method based on intelligent recommendation, device and terminal device
CN109451038A (en) * 2018-12-06 2019-03-08 北京达佳互联信息技术有限公司 A kind of information-pushing method, device, server and computer readable storage medium
CN109446431A (en) * 2018-12-10 2019-03-08 网易传媒科技(北京)有限公司 For the method, apparatus of information recommendation, medium and calculate equipment

Also Published As

Publication number Publication date
CN111898019A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN109165738B (en) Neural network model optimization method and device, electronic device and storage medium
CN111539443A (en) Image recognition model training method and device and storage medium
CN111859020A (en) Recommendation method and device, electronic equipment and computer-readable storage medium
CN107341509B (en) Convolutional neural network training method and device and readable storage medium
EP4068119A1 (en) Model training method and apparatus for information recommendation, electronic device and medium
CN108304078B (en) Input method and device and electronic equipment
CN107025421B (en) Fingerprint identification method and device
CN109447258B (en) Neural network model optimization method and device, electronic device and storage medium
CN108984098B (en) Information display control method and device based on social software
CN113868467A (en) Information processing method, information processing device, electronic equipment and storage medium
CN112784151B (en) Method and related device for determining recommended information
CN111898019B (en) Information pushing method and device
CN111859097B (en) Data processing method, device, electronic equipment and storage medium
CN112712385B (en) Advertisement recommendation method and device, electronic equipment and storage medium
CN111539617B (en) Data processing method and device, electronic equipment, interaction system and storage medium
CN111291268B (en) Information processing method, information processing apparatus, and storage medium
CN114124866A (en) Session processing method, device, electronic equipment and storage medium
CN112036247A (en) Expression package character generation method and device and storage medium
CN110874146A (en) Input method and device and electronic equipment
CN110674416A (en) Game recommendation method and device
CN115484471B (en) Method and device for recommending anchor
CN114416246B (en) Data processing method and device, electronic equipment and storage medium
CN112711643B (en) Training sample set acquisition method and device, electronic equipment and storage medium
CN113190725B (en) Object recommendation and model training method and device, equipment, medium and product
CN117350824B (en) Electronic element information uploading and displaying method, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant