WO2020207249A1 - 通知消息的推送方法、装置、存储介质及电子设备 - Google Patents

通知消息的推送方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2020207249A1
WO2020207249A1 PCT/CN2020/081128 CN2020081128W WO2020207249A1 WO 2020207249 A1 WO2020207249 A1 WO 2020207249A1 CN 2020081128 W CN2020081128 W CN 2020081128W WO 2020207249 A1 WO2020207249 A1 WO 2020207249A1
Authority
WO
WIPO (PCT)
Prior art keywords
notification message
notification
viewing
priority
user
Prior art date
Application number
PCT/CN2020/081128
Other languages
English (en)
French (fr)
Inventor
陈仲铭
何明
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020207249A1 publication Critical patent/WO2020207249A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/214Monitoring or handling of messages using selective forwarding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/224Monitoring or handling of messages providing notification on incoming messages, e.g. pushed notifications of received messages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/226Delivery according to priorities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements

Definitions

  • This application relates to the field of terminal technology, and in particular to a method, device, storage medium and electronic equipment for pushing notification messages.
  • the embodiments of the present application provide a method, device, storage medium, and electronic device for pushing notification messages, which can judge the priority of notification messages in combination with the actual situation of the user, so that the user can view the notification messages currently needed in time.
  • an embodiment of the application provides a method for pushing notification messages, including:
  • an embodiment of the present application provides a notification message pushing device, including:
  • the data acquisition module is used to acquire the content of the notification message when the notification message is received
  • the priority calculation module is configured to calculate the priority of the notification message according to the content based on a pre-trained deep reinforcement learning model, where the deep reinforcement learning model is trained based on the user's experience data of viewing historical notification messages;
  • the message sorting module is used to determine the order of the notification messages according to the priority of the notification message and the priority of the unread messages in the notification column, and determine the display of the notification message according to the priority of the notification message the way;
  • the message push module is configured to push the notification message according to the arrangement order and the display mode.
  • the storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program runs on a computer, the computer executes:
  • an embodiment of the present application provides an electronic device, including a processor and a memory, the memory has a computer program, and the processor is configured to execute:
  • FIG. 1 is a schematic diagram of a panoramic sensing architecture of a method for pushing notification messages provided by an embodiment of the application.
  • FIG. 2 is a schematic flowchart of the first method for pushing notification messages according to an embodiment of the application.
  • Fig. 3 is a schematic diagram of a deep reinforcement learning model provided by an embodiment of the application.
  • Fig. 4 is a schematic structural diagram of a notification message pushing device provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.
  • the embodiment of the application provides a method for pushing notification messages.
  • the execution subject of the method for pushing notification messages may be the notification message pushing device provided in the embodiments of the application, or an electronic device integrating the notification message pushing device, where The notification message pushing device can be implemented in hardware or software.
  • the electronic device can be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer and other devices.
  • This application provides a method for pushing notification messages, including:
  • the method further includes:
  • the content, the viewing duration, and the reward value are used as experience data of the notification message and stored in the experience pool.
  • obtaining the viewing duration and reward value of the user viewing the notification message according to the user viewing the notification message includes:
  • the viewing duration of the notification message is recorded as zero, and the reward value of the notification message is recorded as a negative number.
  • the method further includes:
  • the deep reinforcement learning model is a model based on the deep Q network algorithm, and the value network of the deep reinforcement learning model is trained according to the document topic features and empirical data of the historical notification message to obtain network parameters ,include:
  • the value network is trained to obtain network parameters.
  • calculating the priority of the notification message according to the content based on a pre-trained deep reinforcement learning model includes:
  • calculating the priority of the notification message according to the document subject feature of the notification message and the value network includes:
  • the priority of the notification message is determined according to the action data with the largest Q value, wherein the viewing time in the action data is proportional to the priority.
  • determining the display mode of the notification message according to the priority of the notification message includes:
  • the display mode of the notification message is set to expand display.
  • FIG. 1 is a schematic diagram of a panoramic sensing architecture of a method for pushing notification messages according to an embodiment of the application.
  • the notification message push method is applied to electronic equipment.
  • the electronic device is provided with a panoramic sensing architecture.
  • the panoramic perception architecture is the integration of hardware and software used to implement the notification message push method in an electronic device.
  • the panoramic perception architecture includes an information perception layer, a data processing layer, a feature extraction layer, a scenario modeling layer, and an intelligent service layer.
  • the information perception layer is used to obtain the information of the electronic device itself or the information in the external environment.
  • the information perception layer may include multiple sensors.
  • the information sensing layer includes multiple sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a Hall sensor, a position sensor, a gyroscope, an inertial sensor, a posture sensor, a barometer, and a heart rate sensor.
  • the distance sensor can be used to detect the distance between the electronic device and an external object.
  • the magnetic field sensor can be used to detect the magnetic field information of the environment in which the electronic device is located.
  • the light sensor can be used to detect the light information of the environment in which the electronic device is located.
  • the acceleration sensor can be used to detect the acceleration data of the electronic device.
  • the fingerprint sensor can be used to collect the user's fingerprint information.
  • Hall sensor is a kind of magnetic field sensor made according to Hall effect, which can be used to realize automatic control of electronic equipment.
  • the location sensor can be used to detect the current geographic location of the electronic device. Gyroscopes can be used to detect the angular velocity of electronic devices in various directions. Inertial sensors can be used to detect movement data of electronic devices.
  • the attitude sensor can be used to sense the attitude information of the electronic device.
  • the barometer can be used to detect the air pressure of the environment where the electronic device is located.
  • the heart rate sensor can be used to detect the user's heart rate information.
  • the data processing layer is used to process the data obtained by the information perception layer.
  • the data processing layer can perform data cleaning, data integration, data transformation, and data reduction on the data acquired by the information perception layer.
  • data cleaning refers to cleaning up a large amount of data obtained by the information perception layer to eliminate invalid data and duplicate data.
  • Data integration refers to the integration of multiple single-dimensional data acquired by the information perception layer into a higher or more abstract dimension to comprehensively process multiple single-dimensional data.
  • Data transformation refers to the data type conversion or format conversion of the data acquired by the information perception layer, so that the transformed data meets the processing requirements.
  • Data reduction means to minimize the amount of data while maintaining the original appearance of the data as much as possible.
  • the feature extraction layer is used to perform feature extraction on data processed by the data processing layer to extract features included in the data.
  • the extracted features can reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located.
  • the feature extraction layer can extract features or process the extracted features through methods such as filtering, packaging, and integration.
  • the filtering method refers to filtering the extracted features to delete redundant feature data.
  • the packaging method is used to screen the extracted features.
  • the integration method refers to the integration of multiple feature extraction methods to construct a more efficient and accurate feature extraction method for feature extraction.
  • the scenario modeling layer is used to construct a model based on the features extracted by the feature extraction layer, and the obtained model can be used to represent the state of the electronic device or the state of the user or the environment.
  • the scenario modeling layer can construct key value models, pattern identification models, graph models, entity connection models, object-oriented models, etc. based on the features extracted by the feature extraction layer.
  • the intelligent service layer is used to provide users with intelligent services based on the model constructed by the scenario modeling layer.
  • the intelligent service layer can provide users with basic application services, can perform system intelligent optimization for electronic devices, and can also provide users with personalized intelligent services.
  • the panoramic perception architecture may also include multiple algorithms, each of which can be used to analyze and process data, and the multiple algorithms can form an algorithm library.
  • the algorithm library may include Markov algorithm, hidden Dirichlet distribution algorithm, Bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long Algorithms such as short-term memory networks, convolutional neural networks, and recurrent neural networks.
  • the electronic device collects historical notification messages of the target user through the information perception layer and/or other methods.
  • the intelligent service layer pushes the received notification message according to the message push method proposed in this application. For example, when a notification message is received, the application name corresponding to the notification message is determined, and the content of the notification message is obtained based on the pre-trained depth.
  • the reinforcement learning model calculates the priority of the notification message according to the application name and content.
  • the deep reinforcement learning model is trained based on the user's experience data of viewing historical notification messages.
  • this solution trains the deep reinforcement learning model through the experience data of users viewing historical notification messages to learn the user viewing history The habit of the message, and then judge the priority of the notification message, push the message according to the arrangement order and display mode that matches the user’s viewing habit of the message, and help the user manage the notification message reasonably so that the user can view the current needs in time Notification messages to improve user experience.
  • FIG. 2 is a schematic flowchart of the first method for pushing notification messages according to an embodiment of the application.
  • the specific process of the method for pushing notification messages provided in the embodiments of the present application may be as follows:
  • the notification message in the embodiment of this application can be sent by the system server of the system or process on the electronic device, for example, the mobile phone housekeeper, traffic management, mailbox and other applications; it can also be installed by the user. Sent by the back-end server of third-party applications, for example, xx news, xx music, xx takeaway and other APPs (applications) installed by the user.
  • the system server of the system or process on the electronic device for example, the mobile phone housekeeper, traffic management, mailbox and other applications; it can also be installed by the user.
  • Sent by the back-end server of third-party applications for example, xx news, xx music, xx takeaway and other APPs (applications) installed by the user.
  • the electronic device When the electronic device receives the notification message, it determines the application name corresponding to the notification message and obtains the specific message content of the notification message. For example, the electronic device receives the notification message sent by the background server of the xx news APP, and obtains the specific notification message.
  • the content may be the summary information of a piece of news pushed to the user by the news APP.
  • the experience data of users viewing historical notification messages is used to train the deep reinforcement learning model to learn the user habits of users viewing notification messages.
  • a deep reinforcement learning model based on the DQN (Deep Q Network) algorithm or the A3C (Asynchronous Advantage Actor Critic) algorithm is used.
  • the DQN algorithm combines deep learning (Deep Learning) and reinforcement learning (Reinforcement).
  • Learning is a value-based (value-based) algorithm that combines deep learning to provide a learning mechanism, and enhanced learning can provide learning goals for deep learning.
  • the A3C algorithm is a deep reinforcement learning algorithm based on the improvement of the DQN algorithm.
  • the algorithm outputs the Q value through one value network, and generates TargetQ through another Q-target target network.
  • the value network can be a deep neural network.
  • the value network is trained by the user to view the experience data recorded by historical notification messages to obtain network parameters.
  • the input data of the above value network is status data, action data and feedback (ie reward value).
  • the content of the notification message is used as the status data, and the viewing time of the user viewing the notification message is used as the action data, and the status and action The combination is limited. Assuming that there are m kinds of state data and n kinds of action data, Q can be regarded as an m ⁇ n table.
  • the notification messages can be pushed according to the default push strategy. For example, the newly received notification messages can be pushed in the order of the most recent time when the notification message is received.
  • the notification messages are arranged at the forefront, displayed in the notification bar, and displayed in an expanded display manner. When the user does not view the notification message for more than the preset time, the notification message is collapsed.
  • the electronic device records the user's viewing of these notification messages, and collects the experience data into the experience pool of the deep reinforcement learning model as experience data.
  • experience data As users use various applications on electronic devices for longer time, more and more experience data are stored in the experience pool, and these experience data can then be used to train deep reinforcement learning models. For example, it is set to a preset time interval every interval, and the experience data accumulated and stored in the experience pool is obtained to train the deep reinforcement learning model.
  • the method further includes:
  • the user's viewing of the notification message record the viewing time and reward value of the user viewing the notification message; use the content, the viewing time, the reward value, and the application name as the notification message Experience data is stored in the experience pool.
  • the electronic device After the electronic device pushes the notification message in the notification bar, it records the experience data according to the user's viewing of the notification message, and stores it in the experience pool of the deep reinforcement learning model. Specifically, after receiving the notification message, the electronic device obtains the content of the notification message and the corresponding application name, where the content is recorded as content, the application name is recorded as apk_name, and it is stored as experience data in the format ⁇ apk_name, content ⁇ , After the user views the notification message, obtain the user's viewing time open_time and viewing time review_time of the notification message, and give the notification message a correct feedback information reward, which is recorded as a reward. Finally, the experience data of the notification message becomes ⁇ apk_name, content, open_time, review_time, reward ⁇ .
  • the reward value is the important data used to train the deep reinforcement learning model.
  • the reward value is determined according to whether the push strategy is useful for the user. Specifically, in some embodiments, "according to the user's viewing of the notification message, obtaining the viewing time and reward value of the user viewing the notification message" includes: if the user is detected to click and view the notification message, Record the viewing duration of the user viewing the notification message, and record the reward value of the notification message as a positive number; if it is detected that the notification message is cleared, record the viewing duration of the notification message as zero, and Record the reward value of the notification message as a negative number.
  • the reward value is recorded as 1, and the user does not view the notification message and directly clears the notification message, the reward value of the notification message is recorded as -10.
  • the method further includes:
  • Every preset time interval obtain the experience data of the historical notification messages stored in the experience pool; process the content of the historical notification messages according to the topic model algorithm, and extract the document topic characteristics of the historical notification messages;
  • the document subject features and experience data of historical notification messages are used to train the value network of the deep reinforcement learning model to obtain network parameters.
  • the topic model algorithm is LDA (Latent Dirichlet Allocation, invisible Dirichlet distribution) algorithm, which uses a three-layer structure of words, topics, and documents.
  • LDA Topic Dirichlet Allocation, invisible Dirichlet distribution
  • the topic words and their probability distributions that express the topics can be used as the topic model of the article, and the hidden topic information in the large-scale document set or corpus can be identified.
  • the topic of the notification message content and its probability distribution are extracted by the LDA algorithm as the document topic feature.
  • w i is the network parameter
  • s is the status data
  • a is the action data.
  • the parameter ⁇ is the attenuation constant, which can be set according to the network training situation, and r is the reward value.
  • Deep reinforcement learning can be modeled by state, action, and reward value. Referring to Figure 3, in the current state s, after the action a is executed, the current state becomes s′, and the reward value of the action a is obtained. Deep reinforcement learning is an iterative process. In the process of continuous iteration, for the subject, the state and reward of environmental feedback are harvested and actions are performed; for the environment, the state and reward of environmental feedback are output after accepting the subject’s actions.
  • the value network is trained through empirical data, and the network parameters w i can be learned.
  • the reward value is used as a feedback signal for learning.
  • the mapping relationship between states and actions can be expressed as ⁇ , which is a strategy.
  • the strategy is random, and actions are selected according to the probability of each action.
  • the state data, the action data, and the reward value are used to train the value network to obtain network parameters.
  • the electronic device obtains the experience data from the experience pool, extracts the content in the experience data, obtains the document theme feature through the LDA algorithm, as the state s, obtains the review_time of all the experience data, and normalizes the review_time in the all experience data.
  • the normalized review_time is taken as action a.
  • the reward value of each historical notification message recorded in the experience data is obtained. Use the document topic features, review_time, and reward corresponding to the obtained multiple historical notification messages to train the value network to obtain network parameters.
  • the network parameter w i in the above loss function is obtained through learning. After determining the network parameters, when a new notification message is received, the content of the new notification message is obtained as the next state data s'.
  • calculating the priority of the notification message based on the pre-trained deep reinforcement learning model according to the content includes: obtaining the value network of the pre-trained deep reinforcement learning model; and extracting the information according to the topic model algorithm The document subject feature of the notification message; the priority of the notification message is calculated according to the document subject feature of the notification message and the value network.
  • calculating the priority of the notification message according to the document subject feature of the notification message and the value network includes: taking the document subject feature of the notification message as the next state data of the current value network, and according to training For the good value network, calculate the Q value corresponding to each action data in the value network; determine the priority of the notification message according to the action data with the largest Q value, wherein the viewing time in the action data is the same as the The priority is proportional to each other.
  • the Q value of each action a' can be calculated in the state s'.
  • Q is a probability value
  • the action a'with the largest Q value is the most likely action taken by the user.
  • the electronic device When the electronic device records the user's viewing time of the notification message, it can use the second as the unit and use rounding to record the review time review_time as an integer multiple of 10 seconds. For example, the user actually views a notification message for 42 seconds. , It will be recorded as 40 seconds, the user’s actual viewing time for a notification message is 57 seconds, and it can be recorded as 60 seconds. Set up all possible action data in the value network based on the user's experience data of viewing the notification message, and the action data in the value network is limited.
  • the document subject feature of the notification message is acquired as the next state data s'of the current value network, and the Q value of each action data in the network is calculated according to TargetQ.
  • the action data corresponding to the Q value is the viewing time. Determine the maximum viewing time for Q.
  • the priority of the notification message to be pushed is determined according to the viewing duration.
  • the priority rules can be manually formulated in advance.
  • the viewing time is proportional to the priority, and the mapping relationship between viewing time and priority is preset, for example, 10 seconds corresponds to the first level, 20 seconds corresponds to the second level,..., and so on, view The longer the duration, the higher the priority, and the number of viewing durations is limited.
  • the priority corresponding to the viewing duration with the largest Q value can be obtained.
  • the notification message After determining the priority of the notification message to be pushed, the notification message needs to be pushed according to the priority. If there is no other unread message in the notification bar, the notification message will be pushed directly. Among them, if the viewing time is small, it means the user The probability of not clicking the notification message is high. At this time, the notification message can be folded and pushed to reduce the space occupied by the notification bar of the push message. If the viewing time is long, the push is expanded. Among them, the viewing duration is proportional to the priority.
  • determining the display mode of the notification message according to the priority of the notification message includes: if the priority is not greater than a preset threshold, setting the display mode of the notification message to folded display; if the If the priority is greater than the preset threshold, the display mode of the notification message is set to expand display.
  • the priority of these unread messages is obtained, the new notification messages and these unread messages are arranged in the order of priority from high to low, and the new The notification message is pushed to the notification bar. If the priority of the new notification message is lower, it will be displayed at a later position in the message list of the notification bar; if the priority is higher, it will be displayed at a higher position in the message list of the notification bar.
  • training the value network of the deep reinforcement learning model to obtain network parameters according to the document subject features and experience data of the historical notification message includes: transferring the document of the historical notification message
  • the topic feature is used as the state data of the value network of the deep reinforcement learning model; the viewing time and viewing duration in the experience data of the historical notification message are used as the action data of the value network; according to the state data and the action
  • the data and the reward value are used to train the value network to obtain network parameters.
  • the electronic device when the electronic device records the viewing time of historical notification messages, it records the viewing time in the form of time intervals. For example, 24 hours a day is divided into 24 time intervals, and after obtaining the time point when the user views the historical notification message, it is determined The time interval to which the time point belongs is recorded as the viewing time for the user to view the historical message. Or, in other embodiments, the time interval may also be manually divided by the user according to usage habits.
  • the combination of the viewing time and the viewing duration is used as the action data.
  • the viewing time open_time and the viewing duration review_time are obtained from the experience data as the action data.
  • the mapping relationship between viewing duration and priority For each preset time interval, set the mapping relationship between viewing duration and priority.
  • the Q value of each action a' can be calculated in the state s'.
  • the action data with the largest Q value represents a combination of viewing time and viewing duration, that is, the time interval during which the user is most likely to view the notification message, and the viewing duration during which the notification message is viewed within the time interval.
  • the electronic device When sorting the new notification message and other unread messages in the current notification column, the electronic device first sorts the notification messages according to the time interval, and view the notification messages that belong to the same time interval.
  • the time interval to which the point belongs is regarded as the first time interval, and the other time intervals are arranged later in chronological order. Then, for the multiple notification messages in each time interval, they are arranged in descending order of priority.
  • the reward value of the notification message is recorded as f1; if the time interval has elapsed as time goes by, and If the user does not view the notification message, the reward value of the notification message is recorded as f2; if the user clears the notification message directly, the reward value of the notification message is recorded as f3, where f1 is a positive number, and both f2 and f3 Negative numbers, and f3 ⁇ f2.
  • the current time is 11:20, and the time interval is 11:00-12:00.
  • the time interval corresponding to a newly received notification message is 12:00-13:00.
  • the new notification message can be ranked second in the message list Bit. If the user only views the notification message after 13:00, the reward value of the notification message is recorded as -5. If the user does not view the notification message at any time and the notification message is clear from the notification bar, the The reward value of the notification message is recorded as -10. If the user views the message between 12:00 and 13:00, the reward value of the notification message is recorded as 1.
  • the combination of viewing time and viewing duration is used as the action data, so that notification messages can be pushed more accurately according to user habits. For example, for user A, 8:00-9:00 in the morning is the commuting time to work, it is likely to check news or social software APP news, and at noon 12:00-13:00 lunch time, It is very likely to check the notification messages of ordering apps in time.
  • This solution can learn the habits and rules of users viewing notification messages through the deep reinforcement learning model, and then push notification messages to users at different time periods according to the learned strategy.
  • the value network can be updated to adapt to changes in the user's habit of viewing notification messages.
  • this application is not limited by the execution order of the described processes, and certain processes can also be performed in other order or simultaneously without conflict.
  • the application name corresponding to the notification message is determined, and the content of the notification message is obtained. Based on the pre-trained deep reinforcement learning model, the application name and content are calculated. The priority of the notification message, where the deep reinforcement learning model is trained based on the user's experience data of viewing historical notification messages.
  • the priority of the notification message is determined according to the priority of the notification message and the priority of the unread message in the notification bar
  • Arrangement order and display method push notification messages according to the arrangement order and display method, this solution trains the deep reinforcement learning model through the user's experience data of viewing historical notification messages to learn the user's habit of viewing historical messages, and then the priority of the notification message Make judgments and push messages according to the arrangement order and display mode matching the user's viewing habits of messages, so as to reasonably help users manage notification messages, so that users can view the notification messages currently needed in time, and improve user experience.
  • This application also provides a notification message push device, including:
  • the data acquisition module is used to acquire the content of the notification message when the notification message is received
  • the priority calculation module is configured to calculate the priority of the notification message according to the content based on a pre-trained deep reinforcement learning model, where the deep reinforcement learning model is trained based on the user's experience data of viewing historical notification messages;
  • the message sorting module is used to determine the order of the notification messages according to the priority of the notification message and the priority of the unread messages in the notification column, and determine the display of the notification message according to the priority of the notification message the way;
  • the message push module is configured to push the notification message according to the arrangement order and the display mode.
  • the device further includes a data recording module configured to record the viewing time and reward value of the notification message viewed by the user according to the viewing situation of the notification message by the user;
  • the data recording module is further configured to: if it is detected that the user clicks and views the notification message, record the viewing time of the user viewing the notification message, and record the reward value of the notification message as Positive number;
  • the viewing duration of the notification message is recorded as zero, and the reward value of the notification message is recorded as a negative number.
  • the device further includes:
  • the network training module is used for each preset time interval to obtain the experience data of the historical notification messages stored in the experience pool;
  • the deep reinforcement learning model is a model based on the deep Q network algorithm; the network training module is further used to: use the document topic feature of the historical notification message as the value network of the deep reinforcement learning model Status data;
  • the value network is trained to obtain network parameters.
  • the network training module is further used to: obtain the value network of the pre-trained deep reinforcement learning model
  • the network training module is further configured to: use the document subject feature of the notification message as the next state data of the current value network, and calculate the value in the value network according to the trained value network Q value corresponding to each action data;
  • the priority of the notification message is determined according to the action data with the largest Q value, wherein the viewing time in the action data is proportional to the priority.
  • the message ordering module is also used to:
  • the display mode of the notification message is set to expand display.
  • a device for pushing notification messages is also provided.
  • FIG. 4 is a schematic structural diagram of an apparatus 400 for pushing notification messages according to an embodiment of the application.
  • the device 400 for pushing notification messages is applied to an electronic device.
  • the device 400 for pushing notification messages includes a data acquisition module 401, a priority calculation module 402, a message sorting module 403, and a message pushing module 404, as follows:
  • the data acquisition module 401 is configured to acquire the content of the notification message when the notification message is received.
  • the notification message in the embodiment of this application can be sent by the system server of the system or process on the electronic device, for example, the mobile phone housekeeper, traffic management, mailbox and other applications; it can also be installed by the user. Sent by the back-end server of third-party applications, for example, xx news, xx music, xx takeaway and other APPs (applications) installed by the user.
  • the system server of the system or process on the electronic device for example, the mobile phone housekeeper, traffic management, mailbox and other applications; it can also be installed by the user.
  • Sent by the back-end server of third-party applications for example, xx news, xx music, xx takeaway and other APPs (applications) installed by the user.
  • the data acquisition module 401 determines the application name corresponding to the notification message, and at the same time acquires the specific message content of the notification message. For example, the electronic device receives the notification message sent by the background server of the xx news APP, and obtains the notification message.
  • the specific content of the notification message may be the summary information of a piece of news pushed to the user by the news APP.
  • the priority calculation module 402 is configured to calculate the priority of the notification message based on the pre-trained deep reinforcement learning model according to the content, wherein the deep reinforcement learning model is trained based on the user's experience data of viewing historical notification messages .
  • the priority calculation module 402 uses the user's experience data of viewing historical notification messages to train the deep reinforcement learning model to learn the user habits of the user viewing notification messages.
  • a deep reinforcement learning model based on the DQN (Deep Q Network) algorithm or the A3C (Asynchronous Advantage Actor Critic) algorithm is used.
  • the DQN algorithm combines deep learning (Deep Learning) and reinforcement learning (Reinforcement).
  • Learning is a value-based (value-based) algorithm that combines deep learning to provide a learning mechanism, and enhanced learning can provide learning goals for deep learning.
  • the A3C algorithm is a deep reinforcement learning algorithm based on the improvement of the DQN algorithm.
  • the algorithm outputs the Q value through one value network, and generates TargetQ through another Q-target target network.
  • the value network can be a deep neural network.
  • the value network is trained by the user to view the experience data recorded by historical notification messages to obtain network parameters.
  • the input data of the above value network is status data, action data and feedback (ie reward value).
  • the content of the notification message is used as the status data, and the viewing time of the user viewing the notification message is used as the action data, and the status and action The combination is limited. Assuming that there are m kinds of state data and n kinds of action data, Q can be regarded as an m ⁇ n table.
  • the device can push notification messages according to the default push strategy, for example, according to the order of receiving the notification message from the nearest to the most distant, the new receiving
  • the notification messages that arrive are arranged at the forefront, displayed in the notification bar, and displayed in an expanded display manner.
  • the notification message is collapsed.
  • the electronic device records the user's viewing of these notification messages, and collects the experience data into the experience pool of the deep reinforcement learning model as experience data.
  • experience data As users use various applications on electronic devices for longer time, more and more experience data are stored in the experience pool, and these experience data can then be used to train deep reinforcement learning models. For example, it is set to a preset time interval every interval, and the experience data accumulated and stored in the experience pool is obtained to train the deep reinforcement learning model.
  • the device further includes a data recording module, which is used to record the viewing time and reward value for the user to view the notification message according to the user's viewing of the notification message;
  • the viewing duration, the reward value, and the application name are used as experience data of the notification message and stored in the experience pool.
  • the data recording module After the electronic device pushes the notification message in the notification bar, the data recording module records the experience data according to the user's viewing of the notification message, and stores it in the experience pool of the deep reinforcement learning model. Specifically, after receiving the notification message, the electronic device obtains the content of the notification message and the corresponding application name, where the content is recorded as content, the application name is recorded as apk_name, and it is stored as experience data in the format ⁇ apk_name, content ⁇ , After the user views the notification message, obtain the user's viewing time open_time and viewing time review_time of the notification message, and give the notification message a correct feedback information reward, which is recorded as a reward. Finally, the experience data of the notification message becomes ⁇ apk_name, content, open_time, review_time, reward ⁇ .
  • the reward value is the important data used to train the deep reinforcement learning model.
  • the reward value is determined according to whether the push strategy is useful for the user.
  • the data recording module is further configured to: if it is detected that the user clicks and view the notification message, record the viewing time of the user viewing the notification message, and record the reward value of the notification message It is a positive number; if it is detected that the notification message is cleared, the viewing duration of the notification message is recorded as zero, and the reward value of the notification message is recorded as a negative number.
  • the reward value is recorded as 1, and the user does not view the notification message and directly clears the notification message, the reward value of the notification message is recorded as -10.
  • the push device 400 further includes a network training module, which is used to obtain the experience data of the historical notification messages stored in the experience pool at a preset time interval; Processing is performed to extract the document topic characteristics of the historical notification message; according to the document topic characteristics of the historical notification message and empirical data, the value network of the deep reinforcement learning model is trained to obtain network parameters.
  • a network training module which is used to obtain the experience data of the historical notification messages stored in the experience pool at a preset time interval; Processing is performed to extract the document topic characteristics of the historical notification message; according to the document topic characteristics of the historical notification message and empirical data, the value network of the deep reinforcement learning model is trained to obtain network parameters.
  • the network training module obtains experience data in the experience pool to train the value network.
  • the topic model algorithm is LDA (Latent Dirichlet Allocation, invisible Dirichlet distribution) algorithm, which uses a three-layer structure of words, topics, and documents.
  • the topic words and their probability distributions that express the topics can be used as the topic model of the article, and the hidden topic information in the large-scale document set or corpus can be identified.
  • the topic of the notification message content and its probability distribution are extracted by the LDA algorithm as the document topic feature.
  • the network training module uses the above-mentioned document topic features as status data, and at the same time obtains action data from empirical data to train the value network.
  • w i is the network parameter
  • s is the status data
  • a is the action data.
  • the parameter ⁇ is the attenuation constant, which can be set according to the network training situation, and r is the reward value.
  • Deep reinforcement learning can be modeled by state, action, and reward value. Referring to Fig. 3, in the current state s, after the action a is executed, the current state becomes s′, and the feedback of the action a is obtained, that is, the reward value. Deep reinforcement learning is an iterative process. In the process of continuous iteration, for the subject, the state and reward of environmental feedback are harvested and actions are performed; for the environment, the state and reward of environmental feedback are output after accepting the subject’s actions.
  • the value network is trained through empirical data, and the network parameters w i can be learned.
  • the reward value is used as a feedback signal for learning.
  • the mapping relationship between states and actions can be expressed as ⁇ , which is a strategy.
  • the strategy is random, and actions are selected according to the probability of each action.
  • the network training module is further configured to: use the document subject feature of the historical notification message as the state data of the value network of the deep reinforcement learning model; use the viewing time in the experience data of the historical notification message as the Action data of the value network; training the value network to obtain network parameters according to the state data, the action data and the reward value.
  • the electronic device obtains the experience data from the experience pool, extracts the content in the experience data, obtains the document theme feature through the LDA algorithm, as the state s, obtains the review_time of all the experience data, and normalizes the review_time in the all experience data.
  • the normalized review_time is taken as action a.
  • the reward value of each historical notification message recorded in the experience data is obtained. Use the document topic features, review_time, and reward corresponding to the obtained multiple historical notification messages to train the value network to obtain network parameters.
  • the network parameter w i in the above loss function is obtained through learning. After determining the network parameters, when a new notification message is received, the content of the new notification message is obtained as the next state data s'.
  • the priority calculation module 402 is further configured to: obtain the value network of the pre-trained deep reinforcement learning model; extract the document topic features of the notification message according to the topic model algorithm; and according to the document topic characteristics of the notification message And the value network to calculate the priority of the notification message.
  • the priority calculation module 402 is further configured to: use the document subject feature of the notification message as the next state data of the current value network, and calculate the corresponding action data in the value network according to the trained value network
  • the Q value of; the priority of the notification message is determined according to the action data with the largest Q value, wherein the viewing time in the action data is proportional to the priority.
  • the Q value of each action a' can be calculated in the state s'.
  • Q is a probability value
  • the action a'with the largest Q value is the most likely action taken by the user.
  • the electronic device When the electronic device records the user's viewing time of the notification message, it can use the second as the unit and use rounding to record the review time review_time as an integer multiple of 10 seconds. For example, the user actually views a notification message for 42 seconds. , It will be recorded as 40 seconds, the user’s actual viewing time for a notification message is 57 seconds, and it can be recorded as 60 seconds. Set up all possible action data in the value network based on the user's experience data of viewing the notification message, and the action data in the value network is limited.
  • the document subject feature of the notification message is acquired as the next state data s'of the current value network, and the Q value of each action data in the network is calculated according to TargetQ.
  • the action data corresponding to the Q value is the viewing time. Determine the maximum viewing time for Q.
  • the priority of the notification message to be pushed is determined according to the viewing duration.
  • the priority rules can be manually formulated in advance.
  • the viewing time is proportional to the priority, and the mapping relationship between viewing time and priority is preset, for example, 10 seconds corresponds to the first level, 20 seconds corresponds to the second level,..., and so on, view The longer the duration, the higher the priority, and the number of viewing durations is limited.
  • the priority corresponding to the viewing duration with the largest Q value can be obtained.
  • the message sorting module 403 is configured to determine the order of the notification messages according to the priority of the notification message and the priority of the unread messages in the notification column, and determine the order of the notification message according to the priority of the notification message Display mode.
  • the message push module 404 is configured to push the notification message according to the arrangement order and the display mode.
  • the notification message After determining the priority of the notification message to be pushed, the notification message needs to be pushed according to the priority. If there is no other unread message in the notification bar, the notification message will be pushed directly. Among them, if the viewing time is small, it means the user The probability of not clicking the notification message is high. At this time, the notification message can be folded and pushed to reduce the space occupied by the notification bar of the push message. If the viewing time is long, the push is expanded. Among them, the viewing duration is proportional to the priority.
  • the message sorting module 403 is further configured to: if the priority is not greater than a preset threshold, set the display mode of the notification message to folded display; if the priority is greater than the preset threshold, set The display mode of the notification message is set to expand display.
  • the message sorting module 403 obtains the priority of these unread messages, arranges the new notification message and these unread messages in the order of priority from high to low, and arranges them according to the order Push new notification messages to the notification bar sequentially. If the priority of the new notification message is lower, it will be displayed at a later position in the message list of the notification bar; if the priority is higher, it will be displayed at a higher position in the message list of the notification bar.
  • the network training module is further configured to: use the document subject feature of the historical notification message as the state data of the value network of the deep reinforcement learning model; and use the experience data of the historical notification message
  • the viewing time and the viewing duration in are used as the action data of the value network; according to the state data, the action data and the reward value, the value network is trained to obtain network parameters.
  • the electronic device when the electronic device records the viewing time of the historical notification message, it records the viewing time in the form of time interval. For example, the 24 hours of a day are divided into 24 time intervals, and after obtaining the time point when the user views the historical notification message, it is determined The time interval to which the time point belongs is recorded as the viewing time for the user to view the historical message. Or, in other embodiments, the time interval may also be manually divided by the user according to usage habits.
  • the network training module uses the combination of the viewing time and the viewing duration as the action data.
  • the viewing time open_time and the viewing duration review_time are obtained from the experience data as the action data.
  • the mapping relationship between viewing duration and priority For each preset time interval, set the mapping relationship between viewing duration and priority.
  • TargetQ the Q value of each action a'can be calculated in the state s'.
  • the action data with the largest Q value represents a combination of viewing time and viewing duration, that is, the time interval during which the user is most likely to view the notification message, and the viewing duration during which the notification message is viewed within the time interval.
  • the electronic device When sorting the new notification message and other unread messages in the current notification column, the electronic device first sorts the notification messages according to the time interval, and view the notification messages that belong to the same time interval.
  • the time interval to which the point belongs is regarded as the first time interval, and the other time intervals are arranged later in chronological order. Then, for the multiple notification messages in each time interval, they are arranged in descending order of priority.
  • the reward value of the notification message is recorded as f1; if the time interval has elapsed as time goes by, and If the user does not view the notification message, the reward value of the notification message is recorded as f2; if the user clears the notification message directly, the reward value of the notification message is recorded as f3, where f1 is a positive number, and both f2 and f3 Negative numbers, and f3 ⁇ f2.
  • the current time is 11:20, and the time interval is 11:00-12:00.
  • the time interval corresponding to a newly received notification message is 12:00-13:00.
  • the new notification message can be ranked second in the message list Bit. If the user only views the notification message after 13:00, the reward value of the notification message is recorded as -5. If the user does not view the notification message at any time and the notification message is clear from the notification bar, the The reward value of the notification message is recorded as -10. If the user views the message between 12:00 and 13:00, the reward value of the notification message is recorded as 1.
  • the combination of viewing time and viewing duration is used as the action data, so that notification messages can be pushed more accurately according to user habits. For example, for user A, 8:00-9:00 in the morning is the commuting time to work, it is likely to check news or social software APP news, and at noon 12:00-13:00 lunch time, It is very likely to check the notification messages of ordering apps in time.
  • This solution can learn the habits and rules of users viewing notification messages through the deep reinforcement learning model, and then push notification messages to users at different time periods according to the learned strategy.
  • the value network can be updated to adapt to changes in the user's habit of viewing notification messages.
  • the data acquisition module 401 determines the application name corresponding to the notification message, and acquires the content of the notification message.
  • the priority calculation module 402 is based on pre-trained deep reinforcement learning
  • the model calculates the priority of the notification message based on the application name and content.
  • the deep reinforcement learning model is trained based on the experience data of users viewing historical notification messages.
  • the message sorting module 403 is based on the priority and notification of the notification message.
  • the priority of unread messages in the column determines the arrangement order and display mode of notification messages.
  • the message push module 404 pushes notification messages according to the arrangement order and display mode. This solution trains a deep reinforcement learning model based on the experience data of users viewing historical notification messages.
  • FIG. 5 is a schematic diagram of the first structure of an electronic device provided by an embodiment of this application.
  • the electronic device 300 includes a processor 301 and a memory 302. Wherein, the processor 301 is electrically connected to the memory 302.
  • the processor 301 is the control center of the electronic device 300. It uses various interfaces and lines to connect the various parts of the entire electronic device. It executes the electronic device by running or calling the computer program stored in the memory 302 and calling the data stored in the memory 302. Various functions and processing data of the equipment, so as to monitor the electronic equipment as a whole.
  • the processor 301 in the electronic device 300 loads the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following process, and the processor 301 runs the instructions stored in the memory 302 In order to realize various functions:
  • the processor 301 executes:
  • the content, the viewing duration, and the reward value are used as experience data of the notification message and stored in the experience pool.
  • the processor 301 executes:
  • the viewing duration of the notification message is recorded as zero, and the reward value of the notification message is recorded as a negative number.
  • the processor 301 also executes:
  • the deep reinforcement learning model is a model based on the deep Q network algorithm, and the value network of the deep reinforcement learning model is trained according to the document topic features and empirical data of the historical notification message to obtain network parameters
  • the processor 301 executes:
  • the value network is trained to obtain network parameters.
  • the processor 301 executes:
  • the processor 301 when calculating the priority of the notification message according to the document subject feature of the notification message and the value network, executes:
  • the priority of the notification message is determined according to the action data with the largest Q value, wherein the viewing time in the action data is proportional to the priority.
  • the processor 301 when determining the display mode of the notification message according to the priority of the notification message, executes:
  • the display mode of the notification message is set to expand display.
  • the memory 302 can be used to store computer programs and data.
  • the computer program stored in the memory 302 contains instructions that can be executed in the processor.
  • Computer programs can be composed of various functional modules.
  • the processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.
  • FIG. 6 is a schematic diagram of a second structure of an electronic device provided in an embodiment of this application.
  • the electronic device 300 further includes: a radio frequency circuit 303, a display screen 304, a control circuit 305, an input unit 306, an audio circuit 307, a sensor 308, and a power supply 309.
  • the processor 301 is electrically connected to the radio frequency circuit 303, the display screen 304, the control circuit 305, the input unit 306, the audio circuit 307, the sensor 308, and the power source 309, respectively.
  • the radio frequency circuit 303 is used to transmit and receive radio frequency signals to communicate with network equipment or other electronic equipment through wireless communication.
  • the display screen 304 can be used to display information input by the user or information provided to the user, and various graphical user interfaces of the electronic device. These graphical user interfaces can be composed of images, text, icons, videos, and any combination thereof.
  • the control circuit 305 is electrically connected to the display screen 304 for controlling the display screen 304 to display information.
  • the input unit 306 can be used to receive inputted numbers, character information or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the input unit 306 may include a fingerprint recognition module.
  • the audio circuit 307 can provide an audio interface between the user and the electronic device through a speaker and a microphone.
  • the audio circuit 307 includes a microphone.
  • the microphone is electrically connected to the processor 301.
  • the microphone is used to receive voice information input by the user.
  • the sensor 308 is used to collect external environmental information.
  • the sensor 308 may include one or more of sensors such as an environmental brightness sensor, an acceleration sensor, and a gyroscope.
  • the power supply 309 is used to supply power to various components of the electronic device 300.
  • the power supply 309 may be logically connected to the processor 301 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the electronic device 300 may also include a camera, a Bluetooth module, etc., which will not be repeated here.
  • an embodiment of the present application provides an electronic device that, when receiving a notification message, determines the application name corresponding to the notification message, and obtains the content of the notification message, based on a pre-trained deep reinforcement learning model Calculate the priority of the notification message based on the application name and content.
  • the deep reinforcement learning model is trained based on the user’s experience data of viewing historical notification messages.
  • the priority of the notification message and unread messages in the notification bar To determine the order and display method of notification messages, push notification messages according to the order and display method.
  • This solution trains the deep reinforcement learning model through the experience data of users viewing historical notification messages to learn the habits of users viewing historical messages , And then judge the priority of the notification message, push the message according to the arrangement order and display mode that matches the user's viewing habits of the message, and achieve reasonable help for the user to manage the notification message, so that the user can view the current notification message in time , Improve user experience.
  • An embodiment of the present application also provides a storage medium in which a computer program is stored.
  • the computer program When the computer program is run on a computer, the computer executes the notification message push method described in any of the above embodiments.
  • the storage medium may include, but is not limited to: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Abstract

本申请实施例公开了一种通知消息的推送方法,包括:获取通知消息的内容;基于深度强化学习模型计算其优先级;根据通知消息的优先级和通知栏中未读消息的优先级,确定通知消息的展示方式;按照排列顺序和展示方式推送通知消息,以使用户可以及时查看到当前需要的通知消息,提升用户体验。

Description

通知消息的推送方法、装置、存储介质及电子设备
本申请要求于2019年4月9日提交中国专利局、申请号为201910282211.1、申请名称为“通知消息的推送方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,具体涉及一种通知消息的推送方法、装置、存储介质及电子设备。
背景技术
手机、平板电脑等智能终端中,大部分应用程序都会给用户推送消息,例如,终端上的应用程序通过后台服务器获取通知消息,当应用程序未在前台运行时,终端在通知栏将该通知消息推送给用户。
发明内容
本申请实施例提供了一种通知消息的推送方法、装置、存储介质及电子设备,能够结合用户的实际情况对通知消息的优先级进行判断,以使用户可以及时查看到当前需要的通知消息。
第一方面,本申请实施例了提供了的一种通知消息的推送方法,包括:
当接收到通知消息时,获取所述通知消息的内容;
基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
按照所述排列顺序和所述展示方式推送所述通知消息。
第二方面,本申请实施例了提供了的一种通知消息的推送装置,包括:
数据获取模块,用于当接收到通知消息时,获取所述通知消息的内容;
优先级计算模块,用于基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
消息排序模块,用于根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
消息推送模块,用于按照所述排列顺序和所述展示方式推送所述通知消息。
第三方面,本申请实施例提供的存储介质,其上存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行:
当接收到通知消息时,获取所述通知消息的内容;
基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
按照所述排列顺序和所述展示方式推送所述通知消息。
第四方面,本申请实施例提供了一种电子设备,包括处理器和存储器,所述存储器有计算机程序,所述处理器通过调用所述计算机程序,用于执行:
当接收到通知消息时,获取所述通知消息的内容;
基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
按照所述排列顺序和所述展示方式推送所述通知消息。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的通知消息的推送方法的全景感知架构示意图。
图2为本申请实施例提供的通知消息的推送方法的第一种流程示意图。
图3为本申请实施例提供的深度强化学习模型的原理图。
图4为本申请实施例提供的通知消息的推送装置的结构示意图。
图5为本申请实施例提供的电子设备的第一种结构示意图。
图6为本申请实施例提供的电子设备的第二种结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有付出创造性劳动前提下所获得的所有其他实施例,都属于本申请的保护范围。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
本申请实施例提供一种通知消息的推送方法,该通知消息的推送方法的执行主体可以是本申请实施例提供的通知消息的推送装置,或者集成了该通知消息的推送装置的电子设备,其中该通知消息的推送装置可以采用硬件或者软件的方式实现。其中,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等设备。
本申请提供一种通知消息的推送方法,包括:
当接收到通知消息时,获取所述通知消息的内容;
基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
按照所述排列顺序和所述展示方式推送所述通知消息。
在一些实施例中,按照所述排列顺序和所述展示方式推送所述通知消息之后,所述方法还包括:
根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
在一些实施例中,根据用户对所述通知消息的查看情况,获取用户查看所述通知消息的查看时长和奖励值,包括:
若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
在一些实施例中,所述方法还包括:
每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;
根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;
根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
在一些实施例中,所述深度强化学习模型为基于深度Q网络算法的模型,根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数,包括:
将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;
将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;
根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
在一些实施例中,基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,包括:
获取预先训练好的深度强化学习模型的价值网络;
根据所述主题模型算法提取所述通知消息的文档主题特征;
根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
在一些实施例中,根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级,包括:
将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;
根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
在一些实施例中,根据所述通知消息的优先级确定所述通知消息的展示方式,包括:
若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;
若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
参考图1,图1为本申请实施例提供的通知消息的推送方法的全景感知架构示意图。所述通知消息的推送方法应用于电子设备。所述电子设备中设置有全景感知架构。所述全景感知架构为电子设备中用于实现所述通知消息的推送方法的硬件和软件的集成。
其中,全景感知架构包括信息感知层、数据处理层、特征抽取层、情景建模层以及智能服务层。
信息感知层用于获取电子设备自身的信息或者外部环境中的信息。所述信息感知层可以包括多个传感器。例如,所述信息感知层包括距离传感器、磁场传感器、光线传感器、加速度传感器、指纹传感器、霍尔传感器、位置传感器、陀螺仪、惯性传感器、姿态感应器、气压计、心率传感器等多个传感器。
其中,距离传感器可以用于检测电子设备与外部物体之间的距离。磁场传感器可以用于检测电子设备所处环境的磁场信息。光线传感器可以用于检测电子设备所处环境的光线信息。加速度传感器可以用于检测电子设备的加速度数据。指纹传感器可以用于采集用户的指纹信息。霍尔传感器是根据霍尔效应制作的一种磁场传感器,可以用于实现电子设备的自动控制。位置传感器可以用于检测电子设备当前所处的地理位置。陀螺仪可以用于检测电子设备在各个方向上的角速度。惯性传感器可以用于检测电子设备的运动数据。姿态感应器可以用于感应电子设备的姿态信息。气压计可以用于检测电子设备所处环境的气压。心率传感器可以用于检测用户的心率信息。
数据处理层用于对信息感知层获取到的数据进行处理。例如,数据处理层可以对信息感知层获取到的数据进行数据清理、数据集成、数据变换、数据归约等处理。
其中,数据清理是指对信息感知层获取到的大量数据进行清理,以剔除无效数据和重复数据。数据集成是指将信息感知层获取到的多个单维度数据集成到一个更高或者更抽象的维度,以对多个单维度的数据进行综合处理。数据变换是指对信息感知层获取到的数据进行数据类型的转换或者格式的转换等,以使变换后的数据满足处理的需求。数据归约是指在尽可能保持数据原貌的前提下,最大限度的精简数据量。
特征抽取层用于对数据处理层处理后的数据进行特征抽取,以提取所述数据中包括的特征。提取到的特征可以反映出电子设备自身的状态或者用户的状态或者电子设备所处环境的环境状态等。
其中,特征抽取层可以通过过滤法、包装法、集成法等方法来提取特征或者对提取到的特征进行处理。
过滤法是指对提取到的特征进行过滤,以删除冗余的特征数据。包装法用于对提取到的特征进行筛选。集成法是指将多种特征提取方法集成到一起,以构建一种更加高效、更加准确的特征提取方法,用于提取特征。
情景建模层用于根据特征抽取层提取到的特征来构建模型,所得到的模型可以用于表示电子设备的状态或者用户的状态或者环境状态等。例如,情景建模层可以根据特征抽取层提取到的特征来构建关键值模型、模式标识模型、图模型、实体联系模型、面向对象模型等。
智能服务层用于根据情景建模层所构建的模型为用户提供智能化的服务。例如,智能服务层可以为用户提供基础应用服务,可以为电子设备进行系统智能优化,还可以为用户提供个性化智能服务。
此外,全景感知架构中还可以包括多种算法,每一种算法都可以用于对数据进行分析处理,所述多种算法可以构成算法库。例如,所述算法库中可以包括马尔科夫算法、隐形狄利克雷分布算法、贝叶斯分类算法、支持向量机、K均值聚类算法、K近邻算法、条件随机场、残差网络、长短期记忆网络、卷积神经网络、循环神经网络等算法。
基于上述全景感知构架,电子设备通过信息感知层和/或者其他方式采集目标用户的历史通知消息。智能服务层按照本申请提出的消息推送方法对接收到的通知消息进行推送,例如,在接收到通知消息时,确定该通知消息对应的应用名称,获取通知消息的内容,基于预先训练好的深度强化学习模型,根据应用名称、内容计算该通知消息的优先级,其中,深度强化学习模型是根据用户查看历史通知消息的经验数据训练得到的,接下来,根据通知消息的优先级和通知栏中未读消息的优先级,确定通知消息的排列顺序和展示方式,按照排列顺序和展示方式推送通知消息,本方案通过用户查看历史通知消息的经验数据训练深度强化学习模型,以学习得到用户查看历史消息的习惯,进而对通知消息的优先级进行判断,按照与用户的查看消息的习惯匹配的排列顺序和展示方式进行消息推送,帮助用户合理管理通知消息,以使用户可以及时查看到当前需要的通知消息,提升用户体验。
请参照图2,图2为本申请实施例提供的通知消息的推送方法的第一种流程示意图。本申请实施例提供的通知消息的推送方法的具体流程可以如下:
101、当接收到通知消息时,获取所述通知消息的内容。
相关技术中多是按照接收到通知消息的时间进行排序,或者按照不同的应用类型进行分类推送。但是,上述方案都没有结合用户当前所处的情景状态来对通知消息进行排序,例如,用户在特定的时间节点会看什么样的资讯等,由于用户终端上可能安装了大量的应用,通知栏中的消息数量较大,用户难以快速查看当前需要的通知消息。综上所述,现有的通知消息推送方案,没有结合用户的实际情况对通知消息的优先级进行判断,进而根据优先级进行消息推送,导致用户不能及时查看到当前需要的通知消息。
本申请实施例中的通知消息可以是电子设备上的系统自带应用程序或者进程的系统服务器发送的,例如,手机自带的手机管家、流量管理、邮箱等应用程序;也可以是用户自己安装的第三方应用程序的后台服务器发送的,例如,用户自己安装的xx新闻、xx音乐、xx外卖等APP(Application,应用程序)。
电子设备在接收到通知消息时,确定通知消息对应的应用名称,同时获取该通知消息的具体消息内容,例如,电子设备接收到xx新闻APP的后台服务器发送的通知消息,获取该通知消息的具体内容,其内容可能是该新闻APP推送给用户的一则新闻的概要信息。
102、基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到。
本申请实施例中采用用户查看历史通知消息的经验数据来对深度强化学习模型进行训练,以学习得到用户查看通知消息的用户习惯。例如,采用基于DQN(Deep Q Network,深度Q网络)算法或者A3C(Asynchronous Advantage Actor Critic,异步优势动作评价)算 法的深度强化学习模型,DQN算法是将深度学习(Deep Learning)和强化学习(Reinforcement Learning)相结合的一种value based(基于价值的)算法,深度学习用来提供学习的机制,增强学习可以为深度学习提供学习的目标。A3C算法是基于DQN算法改进后的一种深度强化学习算法。
以DQN算法为例,该算法通过一个价值网络输出Q值,通过另外一个Q-target目标网络产生TargetQ。价值网络可以是一个是深度神经网络。本方案中通过用户查看历史通知消息记录的经验数据训练该价值网络以获取网络参数。上述价值网络的输入数据为状态数据,动作数据和反馈(即奖励值),在本方案中,将通知消息的内容作为状态数据,将用户查看通知消息的查看时长作为动作数据,状态和动作的组合是有限的,假设有m种状态数据,n种动作数据,则可以将Q当作是一张m×n的表格。
在初始阶段,没有历史通知消息的经验数据可以使用的情况下,可以按照默认的推送策略对通知消息进行推送,例如,按照接收到通知消息的时间由近至远的顺序,将新接收到的通知消息排列最前边,显示在通知栏中,并统一按照展开显示的方式展示。当用户超过预设时长仍然没有查看该通知消息,则将通知消息折叠。
此外,电子设备对用户对这些通知消息的查看情况进行记录,作为经验数据收集到深度强化学习模型的经验池中。随着用户对电子设备上各种应用程序的使用时间的延长,经验池中的存储的经验数据会越来越多,进而可以使用这些经验数据训练深度强化学习模型。例如,设置为每间隔预设时间间隔,获取经验池中累积存储的经验数据训练深度强化学习模型。
具体地,在一些实施例中,在推送通知消息后,该方法还包括:
根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;将所述内容、所述查看时长、所述奖励值以及所述应用名称作为所述通知消息的经验数据,存储到经验池。
电子设备在通知栏推送了通知消息后,根据用户对该通知消息的查看情况记录经验数据,存储至深度强化学习模型的经验池。具体地,电子设备在接收到通知消息后,获取该通知消息的内容和对应的应用名称,其中,内容记为content、应用名称记为apk_name,存储为格式为{apk_name,content}的经验数据,当用户查看该通知消息之后,获取用户查看该通知信息的查看时间open_time和查看时长review_time,并给予该通知消息一个正确的反馈信息奖励,记为reward,最终,该通知消息的经验数据变为{apk_name,content,open_time,review_time,reward}。
其中,奖励值是用来训练深度强化学习模型的重要数据。奖励值的大小根据推送策略对用户是否有用来确定。具体地,在一些实施例中,“根据用户对所述通知消息的查看情况,获取用户查看所述通知消息的查看时长和奖励值”,包括:若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
例如,用户点击并查看了通知消息,奖励值记录为1,用户没有查看通知消息并且直接清除了通知消息,则该通知消息的奖励值记录为-10。
在一些实施例中,该方法还包括:
每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
例如,每间隔7-10天,获取经验池中的经验数据,训练价值网络。其中,主题模型算法为LDA(Latent Dirichlet Allocation,隐形狄利克雷分布)算法,该算法采用词、主题和文档三层结构。可以用表达主题的主题词及其概率分布作为文章的主题模型,可以识别大规模文档集或语料库中潜藏的主题信息。本方案中通过LDA算法提取通知消息内容的主题及其概率分布作为文档主题特征。
接下来,使用上述文档主题特征作为状态数据,同时从经验数据中获取动作数据,对价值网络进行训练。本申请实施例中通过MSE(mean-square error,均方误差)来定义价值网络的损失函数,损失函数公式表示如下:L(w i)=E[(Target Q-Q(s,a,w i)) 2],Target Q=r+γmax aQ(s′,a′,w i)。
其中,w i为网络参数,s为状态数据,a为动作数据。参数γ为衰减常数,可以根据网络训练情况设定,r为奖励值。
深度强化学习可以通过状态、动作、奖励值进行建模。参照图3所示,在当前状态s下,执行了a动作后,当前的状态变为s′,并得到动作a的奖励值reward。深度强化学习是一个不断迭代的过程。在不断迭代的过程中,对于主体而言,收获了环境反馈的状态和奖励值,执行了动作;对于环境而言,接受了主体的动作后,输出了环境反馈的状态和奖励值。
通过经验数据训练价值网络,可以学习得到网络参数w i,训练过程中,将奖励值作为反馈信号进行学习。在深度强化学习模型中,状态与动作之间存在的映射关系可以表示为π,即策略。在本实施例中,策略是随机的,根据每个动作的概率选择动作。
具体地,将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
电子设备从经验池中获取经验数据,提取经验数据中的content,通过LDA算法获取文档主题特征,作为状态s,获取全部经验数据的review_time,对全部经验数据中的review_time进行归一化处理,将归一化处理后的review_time作为动作a。同时获取经验数据中记录的每一个历史通知消息的奖励值reward。使用获取的多个历史通知消息对应的文档主题特征、review_time、reward训练价值网络,得到网络参数。
通过学习得到上述损失函数中的网络参数w i。确定网络参数后,在接收到新的通知消息时,获取新的通知消息的内容,作为下一个状态数据s′。
具体地,基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,包括:获取预先训练好的深度强化学习模型的价值网络;根据所述主题模型算法提取所述通知消息的文档主题特征;根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
其中,根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先 级,包括:将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
根据上述TargetQ的计算公式,可以计算出在状态s′下,采取各个动作a′的Q值。其中,Q是一个概率值,Q值最大的动作a′,为用户最可能采取的动作。
电子设备在记录用户对通知消息的查看时间时,可以以秒为单位,同时采用四舍五入的方式,将查看时长review_time记录为10秒的整数倍,例如,用户实际查看一条通知消息的时间为42秒,则将其记录为40秒,用户实际查看一条通知消息的时间为57秒,可以将其记录为60秒。根据用户查看通知消息的经验数据设置价值网络中所有可能的动作数据,价值网络中的动作数据是有限的。
在接收到新的通知消息后,获取通知消息的文档主题特征作为当前的价值网络的下一个状态数据s′,根据TargetQ计算网络中每个动作数据的Q值。Q值对应的动作数据,即查看时长。确定Q值最大的查看时长。根据该查看时长确定待推送的通知消息的优先级。其中,优先级规则可以预先人工制定。例如,查看时长与优先级之间成正比,预先设置查看时长与优先级之间的映射关系表,例如,10秒对应于一级,20秒对应于二级,……,以此类推,查看时长越长,优先级越高,其中,查看时长的数量是有限的。在确定查看时长后,根据该映射关系表,可以获取到Q值最大的查看时长对应的优先级。
103、根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式。
104、按照所述排列顺序和所述展示方式推送所述通知消息。
确定待推送的通知消息的优先级之后,需要根据优先级对该通知消息进行推送,若通知栏中当前没有其他未读消息,则直接推送该通知消息,其中,若查看时长较小,说明用户不会点击该通知消息的概率较高,此时可以将该通知消息折叠推送,减小该推送消息所占通知栏的空间,若查看时长较大,则展开推送。其中,查看时长与优先级之间成正比。
具体地,根据所述通知消息的优先级确定所述通知消息的展示方式,包括:若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
若通知栏中还有其他的未读消息,则获取这些未读消息的优先级,将新的通知消息与这些未读消息按照优先级由高至低的顺序排列,并按照排列顺序将新的通知消息推送到通知栏。如果新的通知消息的优先级较低,则会在通知栏的消息列表中较后的位置显示,若优先级较高,则会在通知栏的消息列表中较前的位置显示。
在另一个可选的实施方式中,根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数,包括:将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;将所述历史通知消息的经验数据中的查看时间和查看时长作为所述价值网络的动作数据;根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
其中,电子设备在记录历史通知消息的查看时间时,以时间区间的形式记录查看时间, 例如,将一天的24小时划分为24个时间区间,获取到用户查看历史通知消息的时间点之后,确定该时间点所属的时间区间,记录该时间区间作为用户查看该历史消息的查看时间。或者,在其他实施例中,时间区间还可以由用户按照使用习惯手动划分。
在该实施方式中,将查看时间和查看时长的组合作为动作数据,在训练价值网络时,从经验数据中获取查看时间open_time和查看时长review_time作为动作数据。针对每个预设的时间区间,设置查看时长与优先级之间的映射关系。根据上述TargetQ的计算公式,可以计算出在状态s′下,采取各个动作a′的Q值。此时Q值最大的动作数据表示的是一个查看时间和查看时长的组合,即用户最可能查看该通知消息的时间区间,以及在该时间区间内查看该通知消息的查看时长。
通过这个方式,对于每一条通知消息,都会得到一个查看时间(对应一个时间区间)和查看时长。在对新的通知消息和当前通知栏中的其他未读消息进行排序时,电子设备先按照时间区间对通知消息排序,查看时间属于同一个时间区间的通知消息相邻排列,其中,以当前时间点所属的时间区间作为第一个时间区间,其他时间区间按照时间顺序依次排列在后面。然后,针对每个时间区间内的多个通知消息,按照优先级由高至低排列。对于一则通知消息,若用户在Q值最大的动作数据对应的时间区间内查看了该通知消息,则该通知消息的奖励值记录为f1;若随着时间的迁移,时间区间已经经过,而用户未查看该通知消息,则将该通知消息的奖励值记录为f2;若用户直接清除通知消息,则将该通知消息的奖励值记录为f3,其中,f1为正数,f2和f3均为负数,且f3<f2。
例如,在一实施方式中,将一天的时间划分为24个时间区间,一个小时为一个区间,f1=1,f2=-5,f3=-10。当前时间为11:20,所属的时间区间为11:00-12:00。根据深度强化学习模型计算得到一条新收到的通知消息对应的时间区间为12:00-13:00。假设当前的通知栏中,时间区间11:00-12:00有一条未读消息,时间区间12:00-13:00没有未读消息,则该新的通知消息可以排列在消息列表的第二位。若到了13:00以后,用户才查看该通知消息,则将该通知消息的奖励值记录为-5,若用户在任何时间都没有查看该通知消息并将该通知消息从通知栏清楚,则将该通知消息的奖励值记录为-10,若用户在12:00-13:00之间查看了该消息,则将该通知消息的奖励值记录为1。
由于用户在不同的时间段使用电子设备的需求不同,按照该实施方式,将查看时间和查看时长的组合作为动作数据,能够更加准确地按照用户习惯来推送通知消息。例如,对于用户A来说,早上8:00-9:00是上班通勤时间,很可能会对新闻类或者社交软件类APP的消息进行查看,而在中午12:00-13:00午餐时间,很可能会及时查看点餐类APP的通知消息,本方案可以通过深度强化学习模型学习得到用户查看通知消息的习惯与规律,进而在不同的时间段,按照学习到的策略为用户推送通知消息。并且,随着对用户查看历史通知消息生成的经验数据的记录,可以实现对价值网络进行更新,使其能够适应于用户查看通知消息的习惯变化。
具体实施时,本申请不受所描述的各个流程的执行顺序的限制,在不产生冲突的情况下,某些流程还可以采用其它顺序进行或者同时进行。
本申请实施例提出的通知消息推送方法,在接收到通知消息时,确定该通知消息对应的应用名称,获取通知消息的内容,基于预先训练好的深度强化学习模型,根据应用名称、 内容计算该通知消息的优先级,其中,深度强化学习模型是根据用户查看历史通知消息的经验数据训练得到的,接下来,根据通知消息的优先级和通知栏中未读消息的优先级,确定通知消息的排列顺序和展示方式,按照排列顺序和展示方式推送通知消息,本方案通过用户查看历史通知消息的经验数据训练深度强化学习模型,以学习得到用户查看历史消息的习惯,进而对通知消息的优先级进行判断,按照与用户的查看消息的习惯匹配的排列顺序和展示方式进行消息推送,实现合理帮助用户管理通知消息,以使用户可以及时查看到当前需要的通知消息,提升用户体验。
本申请还提供一种通知消息的推送装置,包括:
数据获取模块,用于当接收到通知消息时,获取所述通知消息的内容;
优先级计算模块,用于基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
消息排序模块,用于根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
消息推送模块,用于按照所述排列顺序和所述展示方式推送所述通知消息。
在一些实施例中,所述装置还包括数据记录模块,所述数据记录模块用于根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
以及,将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
在一些实施例中,所述数据记录模块还用于:若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
在一些实施例中,所述装置还包括:
网络训练模块,用于每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;
根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;
根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
在一些实施例中,所述深度强化学习模型为基于深度Q网络算法的模型;所述网络训练模块还用于:将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;
将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;
根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
在一些实施例中,所述网络训练模块还用于:获取预先训练好的深度强化学习模型的价值网络;
根据所述主题模型算法提取所述通知消息的文档主题特征;
根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
在一些实施例中,所述网络训练模块还用于:将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;
根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
在一些实施例中,所述消息排序模块还用于:
若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;
若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
在一实施例中还提供了一种通知消息的推送装置。请参阅图4,图4为本申请实施例提供的通知消息的推送装置400的结构示意图。其中该通知消息的推送装置400应用于电子设备,该通知消息的推送装置400包括数据获取模块401、优先级计算模块402、消息排序模块403以及消息推送模块404,如下:
数据获取模块401,用于当接收到通知消息时,获取所述通知消息的内容。
本申请实施例中的通知消息可以是电子设备上的系统自带应用程序或者进程的系统服务器发送的,例如,手机自带的手机管家、流量管理、邮箱等应用程序;也可以是用户自己安装的第三方应用程序的后台服务器发送的,例如,用户自己安装的xx新闻、xx音乐、xx外卖等APP(Application,应用程序)。
电子设备在接收到通知消息时,数据获取模块401确定通知消息对应的应用名称,同时获取该通知消息的具体消息内容,例如,电子设备接收到xx新闻APP的后台服务器发送的通知消息,获取该通知消息的具体内容,其内容可能是该新闻APP推送给用户的一则新闻的概要信息。
优先级计算模块402,用于基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到。
本申请实施例中,优先级计算模块402采用用户查看历史通知消息的经验数据来对深度强化学习模型进行训练,以学习得到用户查看通知消息的用户习惯。例如,采用基于DQN(Deep Q Network,深度Q网络)算法或者A3C(Asynchronous Advantage Actor Critic,异步优势动作评价)算法的深度强化学习模型,DQN算法是将深度学习(Deep Learning)和强化学习(Reinforcement Learning)相结合的一种value based(基于价值的)算法,深度学习用来提供学习的机制,增强学习可以为深度学习提供学习的目标。A3C算法是基于DQN算法改进后的一种深度强化学习算法。
以DQN算法为例,该算法通过一个价值网络输出Q值,通过另外一个Q-target目标网络产生TargetQ。价值网络可以是一个是深度神经网络。本方案中通过用户查看历史通知消息记录的经验数据训练该价值网络以获取网络参数。上述价值网络的输入数据为状态数据,动作数据和反馈(即奖励值),在本方案中,将通知消息的内容作为状态数据,将用户查看通知消息的查看时长作为动作数据,状态和动作的组合是有限的,假设有m种状态数据,n种动作数据,则可以将Q当作是一张m×n的表格。
在初始阶段,没有历史通知消息的经验数据可以使用的情况下,该装置可以按照默认的推送策略对通知消息进行推送,例如,按照接收到通知消息的时间由近至远的顺序,将新接收到的通知消息排列最前边,显示在通知栏中,并统一按照展开显示的方式展示。当用户超过预设时长仍然没有查看该通知消息,则将通知消息折叠。
此外,电子设备对用户对这些通知消息的查看情况进行记录,作为经验数据收集到深度强化学习模型的经验池中。随着用户对电子设备上各种应用程序的使用时间的延长,经验池中的存储的经验数据会越来越多,进而可以使用这些经验数据训练深度强化学习模型。例如,设置为每间隔预设时间间隔,获取经验池中累积存储的经验数据训练深度强化学习模型。
具体地,在一些实施例中,该装置还包括数据记录模块,用于根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;将所述内容、所述查看时长、所述奖励值以及所述应用名称作为所述通知消息的经验数据,存储到经验池。
电子设备在通知栏推送了通知消息后,数据记录模块根据用户对该通知消息的查看情况记录经验数据,存储至深度强化学习模型的经验池。具体地,电子设备在接收到通知消息后,获取该通知消息的内容和对应的应用名称,其中,内容记为content、应用名称记为apk_name,存储为格式为{apk_name,content}的经验数据,当用户查看该通知消息之后,获取用户查看该通知信息的查看时间open_time和查看时长review_time,并给予该通知消息一个正确的反馈信息奖励,记为reward,最终,该通知消息的经验数据变为{apk_name,content,open_time,review_time,reward}。
其中,奖励值是用来训练深度强化学习模型的重要数据。奖励值的大小根据推送策略对用户是否有用来确定。具体地,在一些实施例中,数据记录模块还用于:若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
例如,用户点击并查看了通知消息,奖励值记录为1,用户没有查看通知消息并且直接清除了通知消息,则该通知消息的奖励值记录为-10。
在一些实施例中,推送装置400还包括网络训练模块,用于每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
例如,每间隔7-10天,网络训练模块获取经验池中的经验数据,训练价值网络。其中,主题模型算法为LDA(Latent Dirichlet Allocation,隐形狄利克雷分布)算法,该算法采用词、主题和文档三层结构。可以用表达主题的主题词及其概率分布作为文章的主题模型,可以识别大规模文档集或语料库中潜藏的主题信息。本方案中通过LDA算法提取通知消息内容的主题及其概率分布作为文档主题特征。
接下来,网络训练模块使用上述文档主题特征作为状态数据,同时从经验数据中获取动作数据,对价值网络进行训练。本申请实施例中通过MSE(mean-square error,均方误差)来定义价值网络的损失函数,损失函数公式表示如下: L(w i)=E[(Target Q-Q(s,a,w i)) 2],Target Q=r+γmax aQ(s′,a′,w i)。
其中,w i为网络参数,s为状态数据,a为动作数据。参数γ为衰减常数,可以根据网络训练情况设定,r为奖励值。
深度强化学习可以通过状态、动作、奖励值进行建模。参照图3所示,在当前状态s下,执行了a动作后,当前的状态变为s′,并得到动作a的反馈,即奖励值reward。深度强化学习是一个不断迭代的过程。在不断迭代的过程中,对于主体而言,收获了环境反馈的状态和奖励值,执行了动作;对于环境而言,接受了主体的动作后,输出了环境反馈的状态和奖励值。
通过经验数据训练价值网络,可以学习得到网络参数w i,训练过程中,将奖励值作为反馈信号进行学习。在深度强化学习模型中,状态与动作之间存在的映射关系可以表示为π,即策略。在本实施例中,策略是随机的,根据每个动作的概率选择动作。
具体地,网络训练模块还用于:将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
电子设备从经验池中获取经验数据,提取经验数据中的content,通过LDA算法获取文档主题特征,作为状态s,获取全部经验数据的review_time,对全部经验数据中的review_time进行归一化处理,将归一化处理后的review_time作为动作a。同时获取经验数据中记录的每一个历史通知消息的奖励值reward。使用获取的多个历史通知消息对应的文档主题特征、review_time、reward训练价值网络,得到网络参数。
通过学习得到上述损失函数中的网络参数w i。确定网络参数后,在接收到新的通知消息时,获取新的通知消息的内容,作为下一个状态数据s′。
具体地,优先级计算模块402还用于:获取预先训练好的深度强化学习模型的价值网络;根据所述主题模型算法提取所述通知消息的文档主题特征;根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
其中,优先级计算模块402还用于:将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
根据上述TargetQ的计算公式,可以计算出在状态s′下,采取各个动作a′的Q值。其中,Q是一个概率值,Q值最大的动作a′,为用户最可能采取的动作。
电子设备在记录用户对通知消息的查看时间时,可以以秒为单位,同时采用四舍五入的方式,将查看时长review_time记录为10秒的整数倍,例如,用户实际查看一条通知消息的时间为42秒,则将其记录为40秒,用户实际查看一条通知消息的时间为57秒,可以将其记录为60秒。根据用户查看通知消息的经验数据设置价值网络中所有可能的动作数据,价值网络中的动作数据是有限的。
在接收到新的通知消息后,获取通知消息的文档主题特征作为当前的价值网络的下一个状态数据s′,根据TargetQ计算网络中每个动作数据的Q值。Q值对应的动作数据,即查看 时长。确定Q值最大的查看时长。根据该查看时长确定待推送的通知消息的优先级。其中,优先级规则可以预先人工制定。例如,查看时长与优先级之间成正比,预先设置查看时长与优先级之间的映射关系表,例如,10秒对应于一级,20秒对应于二级,……,以此类推,查看时长越长,优先级越高,其中,查看时长的数量是有限的。在确定查看时长后,根据该映射关系表,可以获取到Q值最大的查看时长对应的优先级。
消息排序模块403,用于根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式。
消息推送模块404,用于按照所述排列顺序和所述展示方式推送所述通知消息。
确定待推送的通知消息的优先级之后,需要根据优先级对该通知消息进行推送,若通知栏中当前没有其他未读消息,则直接推送该通知消息,其中,若查看时长较小,说明用户不会点击该通知消息的概率较高,此时可以将该通知消息折叠推送,减小该推送消息所占通知栏的空间,若查看时长较大,则展开推送。其中,查看时长与优先级之间成正比。
具体地,消息排序模块403还用于:若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
若通知栏中还有其他的未读消息,则消息排序模块403获取这些未读消息的优先级,将新的通知消息与这些未读消息按照优先级由高至低的顺序排列,并按照排列顺序将新的通知消息推送到通知栏。如果新的通知消息的优先级较低,则会在通知栏的消息列表中较后的位置显示,若优先级较高,则会在通知栏的消息列表中较前的位置显示。
在另一个可选的实施方式中,网络训练模块还用于:将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;将所述历史通知消息的经验数据中的查看时间和查看时长作为所述价值网络的动作数据;根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
其中,电子设备在记录历史通知消息的查看时间时,以时间区间的形式记录查看时间,例如,将一天的24小时划分为24个时间区间,获取到用户查看历史通知消息的时间点之后,确定该时间点所属的时间区间,记录该时间区间作为用户查看该历史消息的查看时间。或者,在其他实施例中,时间区间还可以由用户按照使用习惯手动划分。
在该实施方式中,网络训练模块将查看时间和查看时长的组合作为动作数据,在训练价值网络时,从经验数据中获取查看时间open_time和查看时长review_time作为动作数据。针对每个预设的时间区间,设置查看时长与优先级之间的映射关系。根据上述TargetQ的计算公式,可以计算出在状态s′下,采取各个动作a′的Q值。此时Q值最大的动作数据表示的是一个查看时间和查看时长的组合,即用户最可能查看该通知消息的时间区间,以及在该时间区间内查看该通知消息的查看时长。
通过这个方式,对于每一条通知消息,都会得到一个查看时间(对应一个时间区间)和查看时长。在对新的通知消息和当前通知栏中的其他未读消息进行排序时,电子设备先按照时间区间对通知消息排序,查看时间属于同一个时间区间的通知消息相邻排列,其中,以当前时间点所属的时间区间作为第一个时间区间,其他时间区间按照时间顺序依次排列在后面。然后,针对每个时间区间内的多个通知消息,按照优先级由高至低排列。对于一 则通知消息,若用户在Q值最大的动作数据对应的时间区间内查看了该通知消息,则该通知消息的奖励值记录为f1;若随着时间的迁移,时间区间已经经过,而用户未查看该通知消息,则将该通知消息的奖励值记录为f2;若用户直接清除通知消息,则将该通知消息的奖励值记录为f3,其中,f1为正数,f2和f3均为负数,且f3<f2。
例如,在一实施方式中,将一天的时间划分为24个时间区间,一个小时为一个区间,f1=1,f2=-5,f3=-10。当前时间为11:20,所属的时间区间为11:00-12:00。根据深度强化学习模型计算得到一条新收到的通知消息对应的时间区间为12:00-13:00。假设当前的通知栏中,时间区间11:00-12:00有一条未读消息,时间区间12:00-13:00没有未读消息,则该新的通知消息可以排列在消息列表的第二位。若到了13:00以后,用户才查看该通知消息,则将该通知消息的奖励值记录为-5,若用户在任何时间都没有查看该通知消息并将该通知消息从通知栏清楚,则将该通知消息的奖励值记录为-10,若用户在12:00-13:00之间查看了该消息,则将该通知消息的奖励值记录为1。
由于用户在不同的时间段使用电子设备的需求不同,按照该实施方式,将查看时间和查看时长的组合作为动作数据,能够更加准确地按照用户习惯来推送通知消息。例如,对于用户A来说,早上8:00-9:00是上班通勤时间,很可能会对新闻类或者社交软件类APP的消息进行查看,而在中午12:00-13:00午餐时间,很可能会及时查看点餐类APP的通知消息,本方案可以通过深度强化学习模型学习得到用户查看通知消息的习惯与规律,进而在不同的时间段,按照学习到的策略为用户推送通知消息。并且,随着对用户查看历史通知消息生成的经验数据的记录,可以实现对价值网络进行更新,使其能够适应于用户查看通知消息的习惯变化。
本申请实施例提出的通知消息推送装置,在接收到通知消息时,数据获取模块401确定该通知消息对应的应用名称,获取通知消息的内容,优先级计算模块402基于预先训练好的深度强化学习模型,根据应用名称、内容计算该通知消息的优先级,其中,深度强化学习模型是根据用户查看历史通知消息的经验数据训练得到的,接下来,消息排序模块403根据通知消息的优先级和通知栏中未读消息的优先级,确定通知消息的排列顺序和展示方式,消息推送模块404按照排列顺序和展示方式推送通知消息,本方案通过用户查看历史通知消息的经验数据训练深度强化学习模型,以学习得到用户查看历史消息的习惯,进而对通知消息的优先级进行判断,按照与用户的查看消息的习惯匹配的排列顺序和展示方式进行消息推送,实现合理帮助用户管理通知消息,以使用户可以及时查看到当前需要的通知消息,提升用户体验。
本申请实施例还提供一种电子设备。所述电子设备可以是智能手机、平板电脑等设备。如图5所示,图5为本申请实施例提供的电子设备的第一种结构示意图。电子设备300包括处理器301和存储器302。其中,处理器301与存储器302电性连接。
处理器301是电子设备300的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或调用存储在存储器302内的计算机程序,以及调用存储在存储器302内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
在本实施例中,电子设备300中的处理器301会按照如下的流程,将一个或一个以上 的计算机程序的进程对应的指令加载到存储器302中,并由处理器301来运行存储在存储器302中的计算机程序,从而实现各种功能:
当接收到通知消息时,获取所述通知消息的内容;
基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
按照所述排列顺序和所述展示方式推送所述通知消息。
在一些实施例中,按照所述排列顺序和所述展示方式推送所述通知消息之后,处理器301执行:
根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
在一些实施例中,根据用户对所述通知消息的查看情况,获取用户查看所述通知消息的查看时长和奖励值时,处理器301执行:
若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
在一些实施例中,处理器301还执行:
每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;
根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;
根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
在一些实施例中,所述深度强化学习模型为基于深度Q网络算法的模型,根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数时,处理器301执行:
将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;
将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;
根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
在一些实施例中,基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级时,处理器301执行:
获取预先训练好的深度强化学习模型的价值网络;
根据所述主题模型算法提取所述通知消息的文档主题特征;
根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
在一些实施例中,根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级时,处理器301执行:
将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;
根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
在一些实施例中,根据所述通知消息的优先级确定所述通知消息的展示方式时,处理器301执行:
若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;
若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
存储器302可用于存储计算机程序和数据。存储器302存储的计算机程序中包含有可在处理器中执行的指令。计算机程序可以组成各种功能模块。处理器301通过调用存储在存储器302的计算机程序,从而执行各种功能应用以及数据处理。
在一些实施例中,如图6所示,图6为本申请实施例提供的电子设备的第二种结构示意图。电子设备300还包括:射频电路303、显示屏304、控制电路305、输入单元306、音频电路307、传感器308以及电源309。其中,处理器301分别与射频电路303、显示屏304、控制电路305、输入单元306、音频电路307、传感器308以及电源309电性连接。
射频电路303用于收发射频信号,以通过无线通信与网络设备或其他电子设备进行通信。
显示屏304可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图像、文本、图标、视频和其任意组合来构成。
控制电路305与显示屏304电性连接,用于控制显示屏304显示信息。
输入单元306可用于接收输入的数字、字符信息或用户特征信息(例如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。其中,输入单元306可以包括指纹识别模组。
音频电路307可通过扬声器、传声器提供用户与电子设备之间的音频接口。其中,音频电路307包括麦克风。所述麦克风与所述处理器301电性连接。所述麦克风用于接收用户输入的语音信息。
传感器308用于采集外部环境信息。传感器308可以包括环境亮度传感器、加速度传感器、陀螺仪等传感器中的一种或多种。
电源309用于给电子设备300的各个部件供电。在一些实施例中,电源309可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
尽管图6中未示出,电子设备300还可以包括摄像头、蓝牙模块等,在此不再赘述。
由上可知,本申请实施例提供了一种电子设备,所述电子设备在接收到通知消息时,确定该通知消息对应的应用名称,获取通知消息的内容,基于预先训练好的深度强化学习模型,根据应用名称、内容计算该通知消息的优先级,其中,深度强化学习模型是根据用户查看历史通知消息的经验数据训练得到的,接下来,根据通知消息的优先级和通知栏中 未读消息的优先级,确定通知消息的排列顺序和展示方式,按照排列顺序和展示方式推送通知消息,本方案通过用户查看历史通知消息的经验数据训练深度强化学习模型,以学习得到用户查看历史消息的习惯,进而对通知消息的优先级进行判断,按照与用户的查看消息的习惯匹配的排列顺序和展示方式进行消息推送,实现合理帮助用户管理通知消息,以使用户可以及时查看到当前需要的通知消息,提升用户体验。
本申请实施例还提供一种存储介质,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,所述计算机执行上述任一实施例所述的通知消息的推送方法。
需要说明的是,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述计算机程序可以存储于计算机可读存储介质中,所述存储介质可以包括但不限于:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
此外,本申请中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列流程或模块的过程、方法、系统、产品或设备没有限定于已列出的流程或模块,而是某些实施例还包括没有列出的流程或模块,或某些实施例还包括对于这些过程、方法、产品或设备固有的其它流程或模块。
以上对本申请实施例所提供的通知消息的推送方法、装置、存储介质及电子设备进行了详细介绍。本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种通知消息的推送方法,其中,包括:
    当接收到通知消息时,获取所述通知消息的内容;
    基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
    根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
    按照所述排列顺序和所述展示方式推送所述通知消息。
  2. 如权利要求1所述的通知消息的推送方法,其中,按照所述排列顺序和所述展示方式推送所述通知消息之后,所述方法还包括:
    根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
    将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
  3. 如权利要求2所述的通知消息的推送方法,其中,根据用户对所述通知消息的查看情况,获取用户查看所述通知消息的查看时长和奖励值,包括:
    若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
    若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
  4. 如权利要求2所述的通知消息的推送方法,其中,所述方法还包括:
    每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;
    根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;
    根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
  5. 如权利要求4所述的通知消息的推送方法,其中,所述深度强化学习模型为基于深度Q网络算法的模型,根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数,包括:
    将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;
    将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;
    根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
  6. 如权利要求4所述的通知消息的推送方法,其中,基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,包括:
    获取预先训练好的深度强化学习模型的价值网络;
    根据所述主题模型算法提取所述通知消息的文档主题特征;
    根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
  7. 如权利要求6所述的通知消息的推送方法,其中,根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级,包括:
    将所述通知消息的文档主题特征作为当前的价值网络的下一个状态数据,根据训练好的所述价值网络,计算所述价值网络中各个动作数据对应的Q值;
    根据Q值最大的动作数据确定所述通知消息的优先级,其中,所述动作数据中的查看时长与所述优先级之间成正比。
  8. 如权利要求7所述的通知消息的推送方法,其中,根据所述通知消息的优先级确定所述通知消息的展示方式,包括:
    若所述优先级不大于预设阈值,则将所述通知消息的展示方式设置为折叠显示;
    若所述优先级大于所述预设阈值,则将所述通知消息的展示方式设置为展开显示。
  9. 一种通知消息的推送装置,其中,包括:
    数据获取模块,用于当接收到通知消息时,获取所述通知消息的内容;
    优先级计算模块,用于基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
    消息排序模块,用于根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
    消息推送模块,用于按照所述排列顺序和所述展示方式推送所述通知消息。
  10. 如权利要求9所述的通知消息的推送装置,其中,所述装置还包括数据记录模块,所述数据记录模块用于根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
    以及,将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
  11. 如权利要求10所述的通知消息的推送装置,其中,所述数据记录模块还用于:若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
    若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
  12. 如权利要求10所述的通知消息的推送装置,其中,所述装置还包括:
    网络训练模块,用于每间隔预设时间间隔,获取经验池中的存储的历史通知消息的经验数据;
    根据主题模型算法对所述历史通知消息的内容进行处理,提取所述历史通知消息的文档主题特征;
    根据所述历史通知消息的文档主题特征和经验数据,训练所述深度强化学习模型的价值网络,以获取网络参数。
  13. 如权利要求12所述的通知消息的推送装置,其中,所述深度强化学习模型为基于深度Q网络算法的模型;所述网络训练模块还用于:将所述历史通知消息的文档主题特征作为所述深度强化学习模型的价值网络的状态数据;
    将所述历史通知消息的经验数据中的查看时长作为所述价值网络的动作数据;
    根据所述状态数据、所述动作数据和所述奖励值,训练所述价值网络,获取网络参数。
  14. 如权利要求12所述的通知消息的推送装置,其中,所述网络训练模块还用于:获取预先训练好的深度强化学习模型的价值网络;
    根据所述主题模型算法提取所述通知消息的文档主题特征;
    根据所述通知消息的文档主题特征和所述价值网络,计算所述通知消息的优先级。
  15. 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序在计算机上运行时,使得所述计算机执行:
    当接收到通知消息时,获取所述通知消息的内容;
    基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
    根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
    按照所述排列顺序和所述展示方式推送所述通知消息。
  16. 如权利要求15所述的存储介质,其中,当所述计算机程序在计算机上运行时,使得所述计算机执行:
    根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
    将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
  17. 如权利要求16所述的存储介质,其中,当所述计算机程序在计算机上运行时,使得所述计算机执行:
    若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
    若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
  18. 一种电子设备,包括处理器和存储器,所述存储器存储有计算机程序,其中,所述处理器通过调用所述计算机程序,用于执行:
    当接收到通知消息时,获取所述通知消息的内容;
    基于预先训练好的深度强化学习模型,根据所述内容计算所述通知消息的优先级,其中,所述深度强化学习模型根据用户查看历史通知消息的经验数据训练得到;
    根据所述通知消息的优先级和通知栏中未读消息的优先级,确定所述通知消息的排列顺序,并根据所述通知消息的优先级确定所述通知消息的展示方式;
    按照所述排列顺序和所述展示方式推送所述通知消息。
  19. 如权利要求18所述的电子设备,其中,所述处理器还通过调用所述计算机程序,用于执行:
    根据用户对所述通知消息的查看情况,记录用户查看所述通知消息的查看时长和奖励值;
    将所述内容、所述查看时长和所述奖励值作为所述通知消息的经验数据,存储到经验池。
  20. 如权利要求19所述的电子设备,其中,所述处理器还通过调用所述计算机程序,用于执行:
    若检测到用户点击并查看所述通知消息,则记录用户查看所述通知消息的查看时长,并将所述通知消息的奖励值记录为正数;
    若检测到所述通知消息被清除,则将所述通知消息的查看时长记录为零,并将所述通知消息的奖励值记录为负数。
PCT/CN2020/081128 2019-04-09 2020-03-25 通知消息的推送方法、装置、存储介质及电子设备 WO2020207249A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910282211.1A CN111800331A (zh) 2019-04-09 2019-04-09 通知消息的推送方法、装置、存储介质及电子设备
CN201910282211.1 2019-04-09

Publications (1)

Publication Number Publication Date
WO2020207249A1 true WO2020207249A1 (zh) 2020-10-15

Family

ID=72750943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081128 WO2020207249A1 (zh) 2019-04-09 2020-03-25 通知消息的推送方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN111800331A (zh)
WO (1) WO2020207249A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597361A (zh) * 2020-12-16 2021-04-02 北京五八信息技术有限公司 一种排序处理方法、装置、电子设备及存储介质
US20210133642A1 (en) * 2019-11-05 2021-05-06 Microsoft Technology Licensing, Llc User-notification scheduling
CN114827254A (zh) * 2022-04-22 2022-07-29 深圳微言科技有限责任公司 一种消息推送方法、系统及存储介质
CN117082133A (zh) * 2023-10-17 2023-11-17 吉牛云(吉林)农业科技集团有限公司 一种基于云服务的行业政策推送管理系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112260936B (zh) * 2020-10-22 2022-07-15 Oppo广东移动通信有限公司 通知消息的管理方法、装置、终端及存储介质
CN114528469A (zh) * 2020-11-23 2022-05-24 中兴通讯股份有限公司 推荐方法、装置、电子设备、存储介质
CN112738740B (zh) * 2020-12-30 2022-10-11 青岛海尔科技有限公司 消息推送的发送方法及装置、存储介质及电子装置
CN113342442A (zh) * 2021-06-17 2021-09-03 百度在线网络技术(北京)有限公司 基于智能屏的信息显示方法、装置、智能屏设备及介质
CN114237756A (zh) * 2021-12-21 2022-03-25 展讯通信(天津)有限公司 应用程序app的通知消息配置方法和设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103648084A (zh) * 2013-12-05 2014-03-19 百度在线网络技术(北京)有限公司 消息通知栏中显示消息的方法和系统
US8707201B1 (en) * 2012-06-27 2014-04-22 Google Inc. Systems and methods for prioritizing notifications on mobile devices
CN105677313A (zh) * 2015-11-06 2016-06-15 乐视移动智能信息技术(北京)有限公司 通知消息的显示方法、装置及终端设备
CN105760043A (zh) * 2016-01-29 2016-07-13 珠海市魅族科技有限公司 一种通知的处理方法及装置
CN109040430A (zh) * 2018-07-10 2018-12-18 麒麟合盛网络技术股份有限公司 消息展示方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572942A (zh) * 2014-12-30 2015-04-29 小米科技有限责任公司 推送消息显示方法及装置
CN105407158A (zh) * 2015-11-25 2016-03-16 无线生活(杭州)信息科技有限公司 一种建立模型和推送消息的方法及装置
CN105786322A (zh) * 2016-03-22 2016-07-20 北京金山安全软件有限公司 一种应用通知消息展示方法、装置及电子设备
CN105893058A (zh) * 2016-04-27 2016-08-24 北京国电通网络技术有限公司 一种管理通知栏的方法及系统
US20180253659A1 (en) * 2017-03-02 2018-09-06 Bank Of America Corporation Data Processing System with Machine Learning Engine to Provide Automated Message Management Functions
CN108388458A (zh) * 2018-01-26 2018-08-10 广东欧珀移动通信有限公司 消息处理方法及相关产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8707201B1 (en) * 2012-06-27 2014-04-22 Google Inc. Systems and methods for prioritizing notifications on mobile devices
CN103648084A (zh) * 2013-12-05 2014-03-19 百度在线网络技术(北京)有限公司 消息通知栏中显示消息的方法和系统
CN105677313A (zh) * 2015-11-06 2016-06-15 乐视移动智能信息技术(北京)有限公司 通知消息的显示方法、装置及终端设备
CN105760043A (zh) * 2016-01-29 2016-07-13 珠海市魅族科技有限公司 一种通知的处理方法及装置
CN109040430A (zh) * 2018-07-10 2018-12-18 麒麟合盛网络技术股份有限公司 消息展示方法和装置

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210133642A1 (en) * 2019-11-05 2021-05-06 Microsoft Technology Licensing, Llc User-notification scheduling
US11556864B2 (en) * 2019-11-05 2023-01-17 Microsoft Technology Licensing, Llc User-notification scheduling
CN112597361A (zh) * 2020-12-16 2021-04-02 北京五八信息技术有限公司 一种排序处理方法、装置、电子设备及存储介质
CN112597361B (zh) * 2020-12-16 2023-12-12 北京五八信息技术有限公司 一种排序处理方法、装置、电子设备及存储介质
CN114827254A (zh) * 2022-04-22 2022-07-29 深圳微言科技有限责任公司 一种消息推送方法、系统及存储介质
CN117082133A (zh) * 2023-10-17 2023-11-17 吉牛云(吉林)农业科技集团有限公司 一种基于云服务的行业政策推送管理系统
CN117082133B (zh) * 2023-10-17 2023-12-29 吉牛云(吉林)农业科技集团有限公司 一种基于云服务的行业政策推送管理系统

Also Published As

Publication number Publication date
CN111800331A (zh) 2020-10-20

Similar Documents

Publication Publication Date Title
WO2020207249A1 (zh) 通知消息的推送方法、装置、存储介质及电子设备
CN107943860B (zh) 模型的训练方法、文本意图的识别方法及装置
CN108280458B (zh) 群体关系类型识别方法及装置
CN106845644B (zh) 一种通过相互关系学习用户及移动应用的联系的异构网络
WO2017197806A1 (zh) 基于人工智能提供智能服务的方法、智能服务系统及智能终端
CN107958042B (zh) 一种目标专题的推送方法及移动终端
CN110209810B (zh) 相似文本识别方法以及装置
CN104123937A (zh) 提醒设置方法、装置和系统
CN109509473A (zh) 语音控制方法及终端设备
CN111914113A (zh) 一种图像检索的方法以及相关装置
WO2019062418A1 (zh) 应用清理方法、装置、存储介质及电子设备
CN107870810B (zh) 应用清理方法、装置、存储介质及电子设备
WO2021120875A1 (zh) 搜索方法、装置、终端设备及存储介质
CN110570840A (zh) 一种基于人工智能的智能设备唤醒方法和装置
CN110019777A (zh) 一种信息分类的方法及设备
WO2019062369A1 (zh) 应用管理方法、装置、存储介质及电子设备
CN111368525A (zh) 信息搜索方法、装置、设备及存储介质
CN109101505A (zh) 一种推荐方法、推荐装置和用于推荐的装置
CN111797861A (zh) 信息处理方法、装置、存储介质及电子设备
CN108197105A (zh) 自然语言处理方法、装置、存储介质及电子设备
CN112652304B (zh) 智能设备的语音交互方法、装置和电子设备
CN111800445A (zh) 消息推送方法、装置、存储介质及电子设备
CN112052399B (zh) 一种数据处理方法、装置和计算机可读存储介质
WO2021098175A1 (zh) 录制语音包功能的引导方法、装置、设备和计算机存储介质
KR20180049791A (ko) 복수 개의 메시지들을 필터링하는 방법 및 이를 위한 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20787637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20787637

Country of ref document: EP

Kind code of ref document: A1