CN115048504A

CN115048504A - Information pushing method and device, computer equipment and computer readable storage medium

Info

Publication number: CN115048504A
Application number: CN202210535910.4A
Authority: CN
Inventors: 周振江; 周剑; 袁二根; 周康; 李敏
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-09-13

Abstract

The embodiment of the invention relates to the technical field of artificial intelligence, and discloses an information pushing method, which comprises the following steps: classifying the watched contents of all users to obtain a plurality of content classifications; calculating a first probability of the target user watching each content to be recommended according to the preference degree of the target user to each content classification and the similarity of the content to be recommended and the watched content of the target user; calculating a second probability that each content to be recommended is recommended to the target user according to comment information of the user category to which the target user belongs on the content to be recommended; according to the first probability and the second probability, obtaining the heat score of each content to be recommended; and recommending the content to be recommended to a target user according to the popularity score. Through the mode, the content recommendation accuracy is improved.

Description

Information push method, apparatus, computer device, and computer-readable storage medium

技术领域technical field

本发明实施例涉及人工智能技术领域，具体涉及一种信息推送方法、装置、计算机设备及计算机可读存储介质。Embodiments of the present invention relate to the technical field of artificial intelligence, and in particular, to an information push method, apparatus, computer device, and computer-readable storage medium.

背景技术Background technique

目前，内容的推送一般通过分析用户关系内容的相似度或者用户好友已观看内容的方式使用协同过滤的方式进行相关内容的推送，或者简单根据热点内容进行内容的推送。At present, content push generally uses collaborative filtering to push related content by analyzing the similarity of user relationship content or content that has been watched by users' friends, or simply pushes content based on hot content.

发明人在实施本发明实施例的过程中发现，现有在分析用户观看内容的概率时没有综合考虑用户与内容、用户与用户之间的关系以及用户对已观看内容的有效性评价，导致信息推荐的准确度较低。In the process of implementing the embodiments of the present invention, the inventor found that the existing users did not comprehensively consider the user and the content, the relationship between the user and the user, and the user's validity evaluation of the watched content when analyzing the probability of the user viewing the content, resulting in information Recommendations are less accurate.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本发明实施例提供了一种信息推送方法、装置、计算机设备及计算机可读存储介质，用于解决现有技术中存在的信息推荐准确度低的技术问题。In view of the above problems, embodiments of the present invention provide an information push method, apparatus, computer device, and computer-readable storage medium, which are used to solve the technical problem of low information recommendation accuracy existing in the prior art.

根据本发明实施例的一个方面，提供了一种信息推送方法，所述方法包括：According to an aspect of the embodiments of the present invention, there is provided a method for pushing information, the method comprising:

对所有用户的已观看内容进行分类，得到多个内容分类；Classify the watched content of all users to obtain multiple content classifications;

根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率；所述待推荐内容为所述多个内容分类中除所述目标用户已观看内容的任意一个内容；Calculate the first probability that the target user views each of the to-be-recommended content according to the target user's preference for each of the content categories and the similarity between the to-be-recommended content and the target user's watched content; the to-be-recommended content The content is any content in the multiple content categories except the content that the target user has watched;

根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率；Calculate the second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the content to be recommended by the user category to which the target user belongs;

根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分；obtaining a popularity score of each content to be recommended according to the first probability and the second probability;

根据所述热度评分将所述待内容推荐给目标用户。The to-be-content is recommended to the target user according to the popularity score.

在一种可选的方式中，所述对所有用户的已观看内容进行分类，得到多个内容分类，包括：从所有已观看内容中选取K个内容作为分类中心；计算所有已观看内容与各个所述分类中心的相关性；根据所述相关性，得到各个已观看内容之间的相关性矩阵；迭代更新所述分类中心及所述相关性矩阵，直至满足迭代阈值，得到多个内容分类。In an optional manner, classifying the watched content of all users to obtain multiple content classifications, including: selecting K pieces of content from all the watched content as classification centers; Correlation of the classification centers; according to the correlations, a correlation matrix between each watched content is obtained; iteratively update the classification centers and the correlation matrix until the iterative threshold is met, and multiple content classifications are obtained.

在一种可选的方式中，所述根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，包括：确定目标用户的各个已观看内容以及各个所述内容分类的分类中心；根据所述目标用户的各个已观看内容及各个所述内容分类的分类中心，计算所述目标用户对各个所述内容分类的偏好程度。In an optional manner, according to the preference degree of the target user for each of the content categories and the similarity between the content to be recommended and the watched content of the target user, calculating the target user to watch each content category. Describe the first probability of the content to be recommended, including: determining each content that has been watched by the target user and the classification center of each of the content categories; the preference degree of the target user for each of the content categories.

在一种可选的方式中，根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，包括：确定所述目标用户的已观看内容的第一特征向量；确定所述待推荐内容的第二特征向量；根据所述第一特征向量及所述第二特征向量，确定待推荐内容与所述目标用户的已观看内容的相似度。In an optional manner, according to the preference degree of the target user for each of the content categories and the similarity between the content to be recommended and the watched content of the target user, it is calculated that the target user watched each of the to-be-recommended users. The first probability of recommending content includes: determining a first feature vector of the content viewed by the target user; determining a second feature vector of the content to be recommended; according to the first feature vector and the second feature vector , and determine the similarity between the content to be recommended and the watched content of the target user.

在一种可选的方式中，所述根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率，包括：获取所有用户的观看信息；根据所述观看信息及词向量模型，得到各个用户的特征向量；根据所述特征向量进行聚类，得到各个用户所属的用户类别。In an optional manner, calculating the second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the to-be-recommended content according to the user category to which the target user belongs, including : obtain the viewing information of all users; obtain the feature vector of each user according to the viewing information and the word vector model; perform clustering according to the feature vector to obtain the user category to which each user belongs.

在一种可选的方式中，所述根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率，包括：根据所述目标用户所属用户类别对所述待推荐内容的评论信息，确定所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类；计算所述目标用户与所属用户类别中其他用户之间的欧式距离；根据所述欧式距离及所述情感分类，计算每个所述待推荐内容被推荐给所述目标用户的第二概率。In an optional manner, calculating the second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the to-be-recommended content according to the user category to which the target user belongs, including : Determine the sentiment classification of the content to be recommended by other users in the user category to which the target user belongs, according to the comment information on the content to be recommended by the user category to which the target user belongs; Euclidean distance between other users; according to the Euclidean distance and the emotion classification, calculate the second probability that each of the to-be-recommended content is recommended to the target user.

在一种可选的方式中，所述根据所述目标用户所属用户类别对所述待推荐内容的评论信息，确定所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类，包括：获取所述目标用户所属用户类别对所述待推荐内容的评论信息；将所述评论信息输入情感分类模型，得到所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类；所述情感分类模型为预先根据情感分类样本训练得到。In an optional manner, the emotion classification of the content to be recommended by other users in the user category to which the target user belongs is determined according to the comment information on the content to be recommended by the user category to which the target user belongs, Including: obtaining the comment information of the user category to which the target user belongs to the content to be recommended; inputting the comment information into a sentiment classification model to obtain the sentiment classification of the content to be recommended by other users in the user category to which the target user belongs ; The emotion classification model is obtained by training according to emotion classification samples in advance.

根据本发明实施例的另一方面，提供了一种信息推送装置，包括：According to another aspect of the embodiments of the present invention, an apparatus for pushing information is provided, including:

内容分类模块，用于对所有用户的已观看内容进行分类，得到多个内容分类；The content classification module is used to classify the watched content of all users to obtain multiple content classifications;

第一概率计算模块，用于根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率；所述待推荐内容为所述多个内容分类中除所述目标用户已观看内容的任意一个内容；The first probability calculation module is configured to calculate the probability that the target user watches each of the to-be-recommended content according to the target user's preference for each of the content categories and the similarity between the to-be-recommended content and the target user's watched content; The first probability; the content to be recommended is any content in the multiple content categories except the content that the target user has watched;

第二概率计算模块，用于根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率；A second probability calculation module, configured to calculate the second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the to-be-recommended content according to the user category to which the target user belongs;

热度评分模块，用于根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分；a popularity score module, configured to obtain a popularity score of each content to be recommended according to the first probability and the second probability;

推荐模块，用于根据所述热度评分将所述待内容推荐给目标用户。A recommendation module, configured to recommend the to-be-content to target users according to the popularity score.

根据本发明实施例的另一方面，提供了一种计算机设备，包括：处理器、存储器、通信接口和通信总线，所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信；According to another aspect of the embodiments of the present invention, a computer device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface communicate with each other through the communication bus. communication between;

所述存储器用于存放至少一可执行指令，所述可执行指令使所述处理器执行上述的信息推送方法的操作。The memory is used for storing at least one executable instruction, and the executable instruction enables the processor to perform the operations of the above-mentioned information pushing method.

根据本发明实施例的又一方面，提供了一种计算机可读存储介质，所述存储介质中存储有至少一可执行指令，所述可执行指令在计算机设备上运行时，使得计算机设备执行上述的信息推送方法的操作。According to yet another aspect of the embodiments of the present invention, a computer-readable storage medium is provided, where at least one executable instruction is stored in the storage medium, and when the executable instruction is executed on a computer device, the computer device executes the above-mentioned The operation of the information push method.

本发明实施例通过对所有用户的已观看内容进行分类，得到多个内容分类，然后根据所述用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，再根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率，根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分，最后根据所述热度评分将所述待内容推荐给目标用户，能够有效提高信息推荐的准确度。In this embodiment of the present invention, multiple content categories are obtained by classifying the watched content of all users, and then according to the user's preference for each of the content categories and the similarity between the content to be recommended and the watched content of the target user Calculate the first probability of the target user viewing each of the to-be-recommended content, and then calculate that each of the to-be-recommended content is recommended to the target user according to the comment information of the user category to which the target user belongs. The second probability of the target user is obtained, the popularity score of each content to be recommended is obtained according to the first probability and the second probability, and finally the content to be recommended is recommended to the target user according to the popularity score, which can effectively improve the The accuracy of the information recommendation.

上述说明仅是本发明实施例技术方案的概述，为了能够更清楚了解本发明实施例的技术手段，而可依照说明书的内容予以实施，并且为了让本发明实施例的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the embodiments of the present invention. In order to understand the technical means of the embodiments of the present invention more clearly, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and The advantages can be more clearly understood, and the following specific embodiments of the present invention are given.

附图说明Description of drawings

附图仅用于示出实施方式，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：The drawings are only used to illustrate the embodiments and are not considered to be limiting of the present invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1示出了本发明实施例提供的信息推送方法的流程示意图；1 shows a schematic flowchart of an information push method provided by an embodiment of the present invention;

图2示出了本发明实施例提供的信息推送装置的结构示意图；FIG. 2 shows a schematic structural diagram of an information push device provided by an embodiment of the present invention;

图3示出了本发明实施例提供的计算机设备的结构示意图。FIG. 3 shows a schematic structural diagram of a computer device provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein.

目前，内容的推送都是通过分析用户关系内容的相似度或者用户好友已观看内容的方式使用协同过滤的方式进行相关内容的推送或者简单根据热点内容进行内容的推送。现有方案在获取内容特征向量时，主要通过用户评分生成内容的特征向量，没有考虑内容播放时间，播放训练等特征。在获取热点内容时简单的统计内容中的热点词的个数，对于无热点词的内容无法统计。或者根据用户的搜索行为计算得到热点内容，也是基于数量的统计。基于数量统计生成热点内容并不能全面的反应内容的热点度，通过用户的观看行为更能反应内容的热点度。在分析用户观看内容的概率时没有综合考虑用户与内容、用户与用户之间的关系以及用户对已观看内容的有效性评价。At present, content push is performed by analyzing the similarity of user relationship content or the content that has been watched by users' friends, using collaborative filtering to push related content, or simply pushing content based on hot content. When obtaining the feature vector of the content, the existing solution mainly generates the feature vector of the content through user ratings, and does not consider features such as content playback time and playback training. When acquiring the hot content, the number of hot words in the content is simply counted, and the content without hot words cannot be counted. Or the hot content is calculated according to the user's search behavior, which is also based on quantity statistics. Generating hot content based on quantity statistics cannot comprehensively reflect the hotness of the content, but can better reflect the hotness of the content through the user's viewing behavior. When analyzing the probability of users watching content, the user and content, the relationship between users and users, and the user's evaluation of the effectiveness of the watched content are not comprehensively considered.

图1示出了本发明实施例提供的信息推送方法的流程图，该方法由计算机设备执行。该计算机设备可以是电脑、平板电脑、手机、手表、音视频播放设备、穿戴式设备等，本发明实施例不做具体限制。如图1所示，该方法包括以下步骤：FIG. 1 shows a flowchart of an information push method provided by an embodiment of the present invention, where the method is executed by a computer device. The computer device may be a computer, a tablet computer, a mobile phone, a watch, an audio and video playback device, a wearable device, etc., which is not specifically limited in this embodiment of the present invention. As shown in Figure 1, the method includes the following steps:

步骤110：对所有用户的已观看内容进行分类，得到多个内容分类。Step 110: Classify the watched content of all users to obtain multiple content classifications.

其中，本发明实施例在对所有用户的已观看内容进行分类之前，还预先获取用户行为数据，对用户行为数据进行处理后，得到所有用户对应的已观看内容。具体地，通过业务后台将用户行为日志发送到kafka，通过flume(日志收集系统)采集kafka(分布式流媒体平台)中用户行为日志的用户行为数据，并对用户行为数据进行过滤和抽取。其中，可以在flume中设置拦截器，对用户行为数据中的异常数据进行过滤，例如用户名为空，字段值异常等。在过滤后，通过sparkstreaming(构建在Spark上的实时计算框架)进行数据信息的抽取，首先获取用户观看行为的数据信息，然后再从观看行为数据中抽取用户id、已观看内容id、内容播放时长、内容播放时间等观看信息，最后将获取到的观看信息存入hive(数据仓库工具)数据库。The embodiment of the present invention further obtains user behavior data in advance before classifying the watched content of all users, and obtains the watched content corresponding to all users after processing the user behavior data. Specifically, the user behavior log is sent to kafka through the business background, the user behavior data of the user behavior log in kafka (distributed streaming media platform) is collected through flume (log collection system), and the user behavior data is filtered and extracted. Among them, an interceptor can be set in flume to filter abnormal data in user behavior data, such as empty user name, abnormal field value, etc. After filtering, the data information is extracted through sparkstreaming (a real-time computing framework built on Spark). First, the data information of the user's viewing behavior is obtained, and then the user id, watched content id, and content playback time are extracted from the viewing behavior data. , content playback time and other viewing information, and finally store the obtained viewing information into the hive (data warehouse tool) database.

在获取所有用户的已观看内容后，通过词向量模型将已观看内容转换为已观看内容的特征向量V，在得到所有已观看内容的特征向量V后，通过这些特征向量V对所有已观看内容进行分类，得到多个内容分类。其中，包括以下步骤：After obtaining the watched content of all users, the word vector model is used to convert the watched content into the feature vector V of the watched content. Categorize to get multiple content categories. Which includes the following steps:

a.从所有已观看内容中选取K个内容作为分类中心。a. Select K contents from all the watched contents as the classification center.

b.计算所有已观看内容与各个所述分类中心的相关性。具体地，根据选择的分类中心中计算所有用户的已观看内容中第i个已观看内容c_i与各个分类中心v_j的相关性：b. Calculate the correlation of all viewed content with each of the classification centers. Specifically, according to the selected classification center, the correlation between the i-th watched content c _i and each classification center v _j in the watched content of all users is calculated:

其中，p_ij为第i个所有用户的已观看内容c_i与第j个分类中心v_j的相关性；d(c_i,v_j)为所有用户的已观看内容中第i个已观看内容c_i与第j个分类中心v_j的欧式距离；t为变量，取值为0-k。Among them, p _ij is the correlation between the i-th all users' watched content c _i and the j-th classification center v _j ; d( _ci ,v _j ) is the i-th watched content among all the users' watched content The Euclidean distance between c _i and the jth classification center v _j ; t is a variable and takes the value 0-k.

c.根据所述相关性，得到各个已观看内容之间的相关性矩阵。该相关性矩阵为已观看内容为列，分类中心为行的k*n矩阵，n为所有用户的已观看内容的个数。c. According to the correlation, a correlation matrix between the respective viewed contents is obtained. The correlation matrix is a k*n matrix in which the viewed content is the column and the classification center is the row, and n is the number of the watched content of all users.

d.迭代更新所述分类中心及所述相关性矩阵，直至满足迭代阈值，得到多个内容分类。d. Iteratively update the classification center and the correlation matrix until the iterative threshold is met, and obtain multiple content classifications.

在得到相关性矩阵后，更新分类中心，更新方法为：After obtaining the correlation matrix, update the classification center, and the update method is:

其中，p_ij表示第i个已观看内容与第j个已观看内容作为分类中心的相关度；d_j为已观看内容j的特征向量，n表示所有用户的已观看内容的个数；i表示第i个已观看内容。Among them, p _ij represents the correlation between the i-th watched content and the j-th watched content as the classification center; d _j is the feature vector of the watched content j, n represents the number of watched content of all users; i represents The ith watched content.

通过对分类中心的迭代更新，当当前次获得的分类中心点与前一次迭代获得的分类中心的欧式距离小于阈值e时，确定迭代停止，得到k个内容分类。Through the iterative update of the classification center, when the Euclidean distance between the classification center point obtained in the current time and the classification center obtained in the previous iteration is less than the threshold e, it is determined that the iteration is stopped, and k content classifications are obtained.

步骤120：根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率。Step 120: Calculate a first probability that the target user views each of the to-be-recommended contents according to the target user's preference for each of the content categories and the similarity between the to-be-recommended content and the target user's watched content.

其中，所述待推荐内容为所述多个内容分类中除所述目标用户已观看内容的任意一个内容。The to-be-recommended content is any content in the multiple content categories except the content that the target user has watched.

其中，在得到k个内容分类后，确定目标用户的各个已观看内容以及各个所述内容分类的分类中心；根据所述目标用户的各个已观看内容及各个所述内容分类的分类中心，计算所述目标用户对各个所述内容分类的偏好程度。具体地，计算方法可以为：Wherein, after the k content categories are obtained, each watched content of the target user and the classification center of each content classification are determined; according to each watched content of the target user and the classification center of each content classification, calculate the The preference level of the target user for each of the content categories. Specifically, the calculation method can be:

其中，I(u,c_i)表示目标用户u对k个内容分类中第i个内容分类c_i的偏好程度；s表示目标用户u对应的已观看内容总数；X_j表示目标用户u的第j个已观看内容。Among them, I(u, c _i ) represents the preference degree of the target user u to the i-th content category c _i among the k content categories; s represents the total number of watched contents corresponding to the target user u; X _j represents the target user u’s ith content category j watched content.

本发明实施例中，还计算待推荐内容与目标用户u的各个已观看内容的相似度，包括：确定所述目标用户的已观看内容的第一特征向量；确定所述待推荐内容的第二特征向量；根据所述第一特征向量及所述第二特征向量，确定待推荐内容与所述目标用户的已观看内容的相似度。具体地，计算公式可以为：In this embodiment of the present invention, calculating the similarity between the content to be recommended and each watched content of the target user u includes: determining a first feature vector of the content watched by the target user; determining a second feature vector of the content to be recommended feature vector; according to the first feature vector and the second feature vector, determine the similarity between the content to be recommended and the watched content of the target user. Specifically, the calculation formula can be:

其中，W_ij表征目标用户u的s个已观看内容中，第i个已观看内容与第j个待推荐内容所属内容分类的相似度；V_im表示目标用户u的第i个已观看内容的特征向量；V_im表示第j个待推荐内容的特征向量。在分别得到用户偏好程度及待推荐内容与所述目标用户的已观看内容的相似度后，根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率。具体包括以下方式：Among them, W _ij represents the similarity of the content category of the i-th watched content and the j-th content to be recommended among the s watched contents of the target user u; V _im represents the i-th watched content of the target user u. Feature vector; V _im represents the feature vector of the jth content to be recommended. After obtaining the user preference degree and the similarity between the to-be-recommended content and the target user's watched content respectively, according to the target user's preference degree for each of the content categories and the to-be-recommended content and the target user's watched content The similarity of the content is calculated, and the first probability that the target user watches each of the to-be-recommended content is calculated. Specifically, the following methods are included:

其中，p_uj1为目标用户u观看第j个待推荐内容的第一概率；s为目标用户u对应的已观看内容的个数；W_ij为目标用户u的第i个已观看内容与第j个待推荐内容的相似度。当待推荐内容为目标用户u的已观看内容时，P_uj1＝1，即观看概率为1。c_j为第j个待推荐内容所属的内容分类的分类中心。Among them, p _uj1 is the first probability that the target user u watches the j-th content to be recommended; s is the number of watched content corresponding to the target user u; W _ij is the i-th watched content and the j-th content of the target user u The similarity of the content to be recommended. When the content to be recommended is the watched content of the target user u, P _uj1 =1, that is, the viewing probability is 1. c _j is the classification center of the content classification to which the jth content to be recommended belongs.

步骤130：根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率。Step 130: Calculate a second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the content to be recommended by the user category to which the target user belongs.

其中，本发明实施例中预先对用户进行分类，包括：获取所有用户的观看信息，根据所述观看信息及词向量模型，得到各个用户的特征向量；根据所述特征向量进行聚类，得到各个用户所属的用户类别。具体地，可以是word2Vec的训练得到所有用户的特征向量W。在得到用户特征向量W之后，对所有特征向量进行聚类，具体可以使用类似上述内容分类相同的步骤a到步骤d计算所有用户进行群体分类，得到c个用户类别。Wherein, classifying users in advance in this embodiment of the present invention includes: obtaining viewing information of all users, obtaining feature vectors of each user according to the viewing information and a word vector model; clustering according to the feature vectors to obtain each The user category to which the user belongs. Specifically, the feature vector W of all users can be obtained by training word2Vec. After the user feature vector W is obtained, all feature vectors are clustered. Specifically, steps a to d similar to the above content classification can be used to calculate all users for group classification, and obtain c user categories.

在得到目标用户所属的用户类别后，根据所述目标用户所属用户类别对所述待推荐内容的评论信息，确定所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类。计算所述目标用户与所属用户类别中其他用户之间的欧式距离，根据所述欧式距离及所述情感分类，计算每个所述待推荐内容被推荐给所述用户的第二概率。其中，根据所述目标用户所属用户类别对所述待推荐内容的评论信息，确定所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类，具体通过以下方式实现：获取所述用户所属用户类别对所述待推荐内容的评论信息；将所述评论信息输入情感分类模型，得到所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类；所述情感分类模型为预先根据情感分类样本训练得到。After obtaining the user category to which the target user belongs, the sentiment classification of the to-be-recommended content by other users in the user category to which the target user belongs is determined according to the comment information of the user category to which the target user belongs to the to-be-recommended content. Calculate the Euclidean distance between the target user and other users in the user category, and calculate the second probability that each of the to-be-recommended content is recommended to the user according to the Euclidean distance and the sentiment classification. Wherein, determining the sentiment classification of the content to be recommended by other users in the user category to which the target user belongs is determined according to the comment information on the content to be recommended by the user category to which the target user belongs, which is specifically implemented by: obtaining the content to be recommended. Comment information on the content to be recommended by the user category to which the user belongs; input the comment information into a sentiment classification model to obtain the sentiment classification of the content to be recommended by other users in the user category to which the target user belongs; the sentiment classification model It is pre-trained based on sentiment classification samples.

本发明实施例中，训练情感分类模型的过程为：采用BERT模型(自然语言理解开源预训练模型)对情感分类模型进行训练。在BERT的预训练过程中，将Masked LM任务中随机抹去一个或几个词的方式修改为Mask特定词语，可以用Masked LM学习词语在上下文中的表示。计算方式如下：子步骤a：首先需要选取部分种子词语，并标注词语的词性(积极或者消极)。例如积极的词语有喜欢、很好等，消极的词语有不喜欢，不好，尴尬等等。子步骤b:通过选取的种子词语挖掘更多的情感属性词语。挖掘方式如下：首先对所有评论内容进行分词，使用开源分词工具(Stanford CoreNLP)进行分词并得到每个词语的词性。得到所有词语集合以及每个词语的词性后，获取所有形容性词。然后，计算所有形容词与选取的种子词语的相关性。计算公式如下：

其中，P(w₁,w₂)表示词语w₁及w₂同时出现的概率；P(w₁)表示词语w₁出现的概率；P(w2)表示词语w2出现的概率。通过该公式分别计算所有词语与选取的种子单词的相关性，计算得到与积极单词的相关性PP以及与消极单词的相关性PN。然后，计算PP与PN的差值(PP-PN)，如果差值为正，则该形容词为积极的，若为负则为消极的，从而从所有词语中挖掘得到所有的积极和消极形容词。子步骤c:通过子步骤b中得到的所有词语的词性，筛选出每句评论的名词与形容词，并将名词与形容词组成词对。子步骤d:将训练数据中去除特殊符号的样本评论信息通过BERT预训练模型进行预训练。BERT是一个多任务模型，它的任务是由两个自监督任务组成，即MLM和NSP。MLM是指在训练的时候随即从输入语料上mask掉一些单词，然后通过的上下文预测该单词，MLM任务中，将子步骤b和子步骤c中获取到的带有词性的形容词与名词-形容词词对进行mask，其余训练步骤与BERT模型的训练过程相同，从而得到训练好的情感分类模型。In the embodiment of the present invention, the process of training the emotion classification model is as follows: using a BERT model (an open source pre-training model for natural language understanding) to train the emotion classification model. In the pre-training process of BERT, the way of randomly erasing one or several words in the Masked LM task is modified to Mask-specific words, and Masked LM can be used to learn the representation of words in context. The calculation method is as follows: Sub-step a: First, some seed words need to be selected, and the part of speech (positive or negative) of the words should be marked. For example, positive words are like, very good, etc., and negative words are dislike, bad, embarrassing and so on. Sub-step b: Mining more emotional attribute words through the selected seed words. The mining method is as follows: First, perform word segmentation on all the comment content, use the open source word segmentation tool (Stanford CoreNLP) to perform word segmentation and obtain the part of speech of each word. After getting all the word sets and the part of speech of each word, get all the adjectives. Then, calculate the relevance of all adjectives to the selected seed words. Calculated as follows:

Among them, P(w ₁ , w ₂ ) represents the probability that the words w ₁ and w ₂ appear at the same time; P(w ₁ ) represents the probability that the word w ₁ appears; P(w2) represents the probability that the word w2 appears. Through this formula, the correlations between all words and the selected seed words are calculated respectively, and the correlation PP with positive words and the correlation PN with negative words are calculated. Then, the difference between PP and PN (PP-PN) is calculated, if the difference is positive, the adjective is positive, if it is negative, it is negative, so that all positive and negative adjectives are mined from all words. Sub-step c: filter out the nouns and adjectives of each comment based on the parts of speech of all words obtained in sub-step b, and form word pairs with the nouns and adjectives. Sub-step d: Pre-train the sample comment information with special symbols removed from the training data through the BERT pre-training model. BERT is a multi-task model whose task is composed of two self-supervised tasks, namely MLM and NSP. MLM refers to masking some words from the input corpus during training, and then predicting the word through the context. In the MLM task, the adjective with part of speech and the noun-adjective word obtained in sub-step b and sub-step c are Mask the pair, and the rest of the training steps are the same as the training process of the BERT model, so as to obtain a trained sentiment classification model.

其中，在得到情感分类模型后，根据得到的情感分类模型计算每个用户对每个已观看内容(也即待推荐内容)的评论内容的情感分类，其中情感分类为“积极的”即为推荐，消极即为不推荐。Wherein, after the sentiment classification model is obtained, the sentiment classification of each user's comment content for each watched content (that is, the content to be recommended) is calculated according to the obtained sentiment classification model, where the sentiment classification as "positive" is the recommendation. , negative is not recommended.

在确定了情感分类后，计算所述目标用户u与所属用户类别中其他用户b之间的欧式距离u_ub，根据所述欧式距离及所述情感分类，计算每个所述待推荐内容被推荐给所述目标用户的第二概率。具体计算公式可以表示为：After determining the sentiment classification, calculate the Euclidean distance u _ub between the target user u and other users b in the user category, and calculate the recommended content for each content to be recommended according to the Euclidean distance and the sentiment classification a second probability for the target user. The specific calculation formula can be expressed as:

其中，p_uj2为对目标用户u推荐第j个待推荐内容的第二概率；U_ub为目标用户u与用户b之间的欧式距离；r为用户b对待推荐内容j的情感分类，其中，推荐为1，不推荐为-1，没有评论为0。q为目标用户u所属用户分类中用户的个数。Among them, p _uj2 is the second probability of recommending the j-th content to be recommended to the target user u; U _ub is the Euclidean distance between the target user u and the user b; r is the sentiment classification of the user b for the recommended content j, where, 1 for recommended, -1 for not recommended, 0 for no reviews. q is the number of users in the user category to which the target user u belongs.

步骤140：根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分。Step 140: Obtain a popularity score of each content to be recommended according to the first probability and the second probability.

其中，在得到第一概率和第二概率后，根据下述计算公式，确定用户u观看第j个待推荐内容的目标概率：Wherein, after obtaining the first probability and the second probability, according to the following calculation formula, determine the target probability that user u watches the jth content to be recommended:

在得到目标概率后，根据目标概率确定每个待推荐内容的热度评分，具体热度评分可根据如下公式计算得到：After the target probability is obtained, the popularity score of each content to be recommended is determined according to the target probability. The specific popularity score can be calculated according to the following formula:

其中，Score_j为第j个待推荐内容的热度评分，C为所有用户的总数。Among them, Score _j is the popularity score of the jth content to be recommended, and C is the total number of all users.

步骤150：根据所述热度评分将所述待内容推荐给目标用户。Step 150: Recommend the to-be-content to target users according to the popularity score.

在得到各个待推荐内容的热度评分后，可根据热度评分确定热度高的待推荐内容，将热度高的待推荐内容推荐给对应的目标用户u。其中，可以按照热度评分对待推荐内容进行排序，取前TOPN的待推荐内容推荐给相应的目标用户。After the popularity score of each content to be recommended is obtained, the content to be recommended with high popularity can be determined according to the popularity score, and the content to be recommended with high popularity is recommended to the corresponding target user u. Among them, the content to be recommended can be sorted according to the popularity score, and the content to be recommended in the top TOPN is taken and recommended to the corresponding target users.

本发明实施例通过对所有用户的已观看内容进行分类，得到多个内容分类，然后根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，再根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率，根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分，最后根据所述热度评分将所述待内容推荐给目标用户，能够有效提高信息推荐的准确度。In this embodiment of the present invention, a plurality of content categories are obtained by classifying the watched content of all users, and then according to the preference degree of the target user for each content classification and the similarity between the content to be recommended and the watched content of the target user , calculate the first probability of the target user viewing each of the to-be-recommended content, and then calculate that each of the to-be-recommended content is recommended to the The second probability of the target user. According to the first probability and the second probability, the popularity score of each content to be recommended is obtained, and finally the content to be recommended is recommended to the target user according to the popularity score, which can effectively improve the information Recommended accuracy.

图2示出了本发明实施例提供的信息推送装置的结构示意图。如图2所示，该装置200包括：FIG. 2 shows a schematic structural diagram of an information pushing apparatus provided by an embodiment of the present invention. As shown in Figure 2, the device 200 includes:

内容分类模块210，用于对所有用户的已观看内容进行分类，得到多个内容分类；The content classification module 210 is used to classify the watched content of all users to obtain a plurality of content classifications;

第一概率计算模块220，用于根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率；所述待推荐内容为所述多个内容分类中除所述目标用户已观看内容的任意一个内容；The first probability calculation module 220 is configured to calculate the target user's viewing of each of the to-be-recommended content according to the target user's preference for each of the content categories and the similarity between the to-be-recommended content and the target user's watched content The first probability of ; the to-be-recommended content is any content in the multiple content categories except the content that the target user has watched;

第二概率计算模块230，用于根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率；A second probability calculation module 230, configured to calculate a second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the content to be recommended by the user category to which the target user belongs;

热度评分模块240，用于根据所述第一概率和所述第二概率，得到每个待推荐内容的热度评分；A popularity score module 240, configured to obtain a popularity score of each content to be recommended according to the first probability and the second probability;

推荐模块250，用于根据所述热度评分将所述待内容推荐给目标用户。The recommendation module 250 is configured to recommend the to-be-content to target users according to the popularity score.

本发明实施例的信息推送装置的具体工作过程与上述方法实施例的具体流程步骤大体一致，此处不再赘述。The specific working process of the information pushing apparatus according to the embodiment of the present invention is substantially the same as the specific flow steps of the above-mentioned method embodiments, and will not be repeated here.

图3示出了本发明实施例提供的计算机设备的结构示意图，本发明具体实施例并不对计算机设备设备的具体实现做限定。FIG. 3 shows a schematic structural diagram of a computer device provided by an embodiment of the present invention. The specific embodiment of the present invention does not limit the specific implementation of the computer device.

如图3所示，该计算机设备设备可以包括：处理器(processor)402、通信接口(Communications Interface)404、存储器(memory)406、以及通信总线408。As shown in FIG. 3 , the computer equipment may include: a processor (processor) 402 , a communications interface (Communications Interface) 404 , a memory (memory) 406 , and a communication bus 408 .

其中：处理器402、通信接口404、以及存储器406通过通信总线408完成相互间的通信。通信接口404，用于与其它设备比如客户端或其它服务器等的网元通信。处理器402，用于执行程序410，具体可以执行上述用于信息推送方法实施例中的相关步骤。The processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 . The communication interface 404 is used for communicating with network elements of other devices such as clients or other servers. The processor 402 is configured to execute the program 410, and specifically may execute the relevant steps in the above-mentioned embodiments of the method for pushing information.

具体地，程序410可以包括程序代码，该程序代码包括计算机可执行指令。Specifically, program 410 may include program code, which includes computer-executable instructions.

处理器402可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 402 may be a central processing unit (CPU), or an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer equipment may be the same type of processors, such as one or more CPUs; or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器406，用于存放程序410。存储器406可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 406 is used to store the program 410 . Memory 406 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

程序410具体可以被处理器402调用使计算机设备执行以下操作：The program 410 can be specifically called by the processor 402 to make the computer device perform the following operations:

根据目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率；所述待推荐内容为所述多个内容分类中除所述目标用户已观看内容的任意一个内容；Calculate the first probability that the target user views each of the to-be-recommended content according to the target user's preference for each of the content categories and the similarity between the to-be-recommended content and the user's watched content; the to-be-recommended content any one content in the plurality of content categories except the content that the target user has watched;

根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述用户的第二概率；Calculate the second probability that each of the to-be-recommended content is recommended to the user according to the comment information on the to-be-recommended content of the user category to which the target user belongs;

在一种可选的方式中，所述根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，包括：确定目标用户的各个已观看内容以及各个所述内容分类的分类中心；根据所述目标用户的各个已观看内容及各个所述内容分类的分类中心，计算所述目标用户对各个所述内容分类的偏好程度。In an optional manner, calculating the target user's viewing of each said The first probability of the content to be recommended includes: determining each content viewed by the target user and the classification center of each content category; The preference level of the target user for each of the content categories.

在一种可选的方式中，所述根据所述目标用户所属用户类别对所述待推荐内容的评论信息，计算每个所述待推荐内容被推荐给所述目标用户的第二概率，包括：根据所述目标用户所属用户类别对所述待推荐内容的评论信息，确定所述目标用户所属用户类别中其他用户对所述待推荐内容的情感分类；计算所述用户与所属用户类别中其他用户之间的欧式距离；根据所述欧式距离及所述情感分类，计算每个所述待推荐内容被推荐给所述目标用户的第二概率。In an optional manner, calculating the second probability that each of the to-be-recommended content is recommended to the target user according to the comment information on the to-be-recommended content according to the user category to which the target user belongs, including : Determine the sentiment classification of the content to be recommended by other users in the user category to which the target user belongs, according to the comment information on the content to be recommended by the user category to which the target user belongs; Euclidean distance between users; according to the Euclidean distance and the sentiment classification, calculate the second probability that each of the to-be-recommended content is recommended to the target user.

本发明实施例提供了一种计算机可读存储介质，所述存储介质存储有至少一可执行指令，该可执行指令在计算机设备上运行时，使得所述计算机设备执行上述任意方法实施例中的信息推送方法。An embodiment of the present invention provides a computer-readable storage medium, where the storage medium stores at least one executable instruction, and when the executable instruction is executed on a computer device, causes the computer device to execute any of the above method embodiments. Information push method.

可执行指令具体可以用于使得计算机设备执行以下操作：Specifically, the executable instructions can be used to cause the computer device to perform the following operations:

在一种可选的方式中，所述根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述目标用户的已观看内容的相似度，计算所述用户观看各个所述待推荐内容的第一概率，包括：确定目标用户的各个已观看内容以及各个所述内容分类的分类中心；根据所述目标用户的各个已观看内容及各个所述内容分类的分类中心，计算所述目标用户对各个所述内容分类的偏好程度。In an optional manner, according to the preference of the target user for each of the content categories and the similarity between the content to be recommended and the watched content of the target user, calculating the user to watch each of the content The first probability of the content to be recommended includes: determining each content viewed by the target user and the classification center of each content category; The preference level of the target user for each of the content categories.

在一种可选的方式中，根据所述目标用户对各个所述内容分类的偏好程度以及待推荐内容与所述用户的已观看内容的相似度，计算所述目标用户观看各个所述待推荐内容的第一概率，包括：确定所述目标用户的已观看内容的第一特征向量；确定所述待推荐内容的第二特征向量；根据所述第一特征向量及所述第二特征向量，确定待推荐内容与所述目标用户的已观看内容的相似度。In an optional manner, according to the target user's preference for each of the content categories and the similarity between the content to be recommended and the user's watched content, it is calculated that the target user watches each of the to-be-recommended content The first probability of the content includes: determining the first feature vector of the watched content of the target user; determining the second feature vector of the content to be recommended; according to the first feature vector and the second feature vector, The similarity between the content to be recommended and the watched content of the target user is determined.

本发明实施例提供一种信息推送装置，用于执行上述信息推送方法。An embodiment of the present invention provides an information push apparatus, which is used for executing the above information push method.

本发明实施例提供了一种计算机程序，所述计算机程序可被处理器调用使计算机设备执行上述任意方法实施例中的信息推送方法。An embodiment of the present invention provides a computer program, and the computer program can be invoked by a processor to cause a computer device to execute the information pushing method in any of the foregoing method embodiments.

本发明实施例提供了一种计算机程序产品，计算机程序产品包括存储在计算机可读存储介质上的计算机程序，计算机程序包括程序指令，当程序指令在计算机上运行时，使得所述计算机执行上述任意方法实施例中的信息推送方法。An embodiment of the present invention provides a computer program product. The computer program product includes a computer program stored on a computer-readable storage medium, and the computer program includes program instructions. When the program instructions are run on a computer, the computer is made to execute any of the above. The information push method in the method embodiment.

在此提供的算法或显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明实施例也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms or displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, embodiments of the present invention are not directed to any particular programming language. It is to be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

类似地，应当理解，为了精简本发明并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明实施例的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。Similarly, it is to be understood that, in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together into a single implementation in order to simplify the invention and to aid in the understanding of one or more of the various aspects of the invention. examples, figures, or descriptions thereof. This disclosure, however, should not be construed as reflecting an intention that the invention as claimed requires more features than are expressly recited in each claim.

本领域技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。上述实施例中的步骤，除有特殊说明外，不应理解为对执行顺序的限定。It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names. The steps in the above embodiments should not be construed as limitations on the execution order unless otherwise specified.

Claims

1. An information pushing method, characterized in that the method comprises:

classifying the watched contents of all users to obtain a plurality of content classifications;

calculating a first probability of the target user watching each content to be recommended according to the preference degree of the target user to each content classification and the similarity between the content to be recommended and the content watched by the target user; the content to be recommended is any one of the content categories except the content watched by the target user;

calculating a second probability that each content to be recommended is recommended to the target user according to comment information of the user category to which the target user belongs on the content to be recommended;

according to the first probability and the second probability, obtaining the heat score of each content to be recommended;

and recommending the content to be recommended to a target user according to the popularity score.

2. The method of claim 1, wherein the classifying the viewed content of all users results in a plurality of content classifications, comprising:

selecting K contents from all watched contents as a classification center;

calculating the correlation of all the viewed contents with each classification center;

obtaining a correlation matrix among the watched contents according to the correlation;

and iteratively updating the classification center and the correlation matrix until an iteration threshold is met, and obtaining a plurality of content classifications.

3. The method according to claim 1, wherein the calculating a first probability that the target user watches each content to be recommended according to the preference degree of the target user for each content category and the similarity between the content to be recommended and the watched content of the target user comprises:

determining each viewed content of a target user and a classification center of each content classification;

and calculating the preference degree of the target user to each content classification according to each watched content of the target user and the classification center of each content classification.

4. The method according to any one of claims 1 to 3, wherein calculating the first probability that the target user watches each of the contents to be recommended according to the preference degree of the target user for each of the content categories and the similarity between the contents to be recommended and the watched contents of the target user comprises:

determining a first feature vector of viewed content of the target user;

determining a second feature vector of the content to be recommended;

and determining the similarity between the content to be recommended and the watched content of the target user according to the first feature vector and the second feature vector.

5. The method according to any one of claims 1 to 3, wherein the calculating of the second probability that each content to be recommended is recommended to the target user according to the comment information of the user category to which the target user belongs for the content to be recommended comprises:

acquiring the watching information of all users;

obtaining a feature vector of each user according to the viewing information and the word vector model;

and clustering according to the characteristic vectors to obtain the user category to which each user belongs.

6. The method according to claim 5, wherein the calculating a second probability that each content to be recommended is recommended to the target user according to comment information of the content to be recommended of the user category to which the target user belongs comprises:

according to the comment information of the user category to which the target user belongs to the content to be recommended, determining the emotion classification of other users in the user category to which the target user belongs to the content to be recommended;

calculating Euclidean distances between the target user and other users in the user category to which the target user belongs;

and calculating a second probability that each content to be recommended is recommended to the user according to the Euclidean distance and the emotion classification.

7. The method according to claim 6, wherein the determining the emotional classification of the content to be recommended by other users in the user category to which the target user belongs according to the comment information of the user category to which the target user belongs to the content to be recommended comprises:

obtaining comment information of the user category to which the target user belongs to the content to be recommended;

inputting the comment information into an emotion classification model to obtain emotion classification of other users in the user category to which the target user belongs on the content to be recommended; the emotion classification model is obtained by training according to emotion classification samples in advance.

8. An information pushing apparatus, characterized in that the apparatus comprises:

the content classification module is used for classifying the watched contents of all the users to obtain a plurality of content classifications;

the first probability calculation module is used for calculating first probabilities of the target users for watching the contents to be recommended according to the preference degrees of the target users for the contents in each category and the similarity between the contents to be recommended and the watched contents of the target users; the content to be recommended is any one of the content categories except the content watched by the target user;

the second probability calculation module is used for calculating a second probability that each content to be recommended is recommended to the target user according to comment information of the user category to which the target user belongs on the content to be recommended;

the popularity scoring module is used for obtaining popularity scores of the contents to be recommended according to the first probability and the second probability;

and the recommending module is used for recommending the content to be recommended to the target user according to the popularity score.

9. A computer device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation of the information pushing method according to any one of claims 1-7.

10. A computer-readable storage medium, having at least one executable instruction stored therein, which when executed on a computer device, causes the computer device to perform the operations of the information push method according to any one of claims 1 to 7.