WO2023065618A1 - 基于多头自注意神经机制的多模态新闻推荐方法及装置 - Google Patents

基于多头自注意神经机制的多模态新闻推荐方法及装置 Download PDF

Info

Publication number
WO2023065618A1
WO2023065618A1 PCT/CN2022/087220 CN2022087220W WO2023065618A1 WO 2023065618 A1 WO2023065618 A1 WO 2023065618A1 CN 2022087220 W CN2022087220 W CN 2022087220W WO 2023065618 A1 WO2023065618 A1 WO 2023065618A1
Authority
WO
WIPO (PCT)
Prior art keywords
news
data
information
feature
model
Prior art date
Application number
PCT/CN2022/087220
Other languages
English (en)
French (fr)
Inventor
欧中洪
刘沛航
韩宗志
宋美娜
钟茂华
梁昊光
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Publication of WO2023065618A1 publication Critical patent/WO2023065618A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the disclosure belongs to the field of artificial intelligence, and specifically relates to a multi-modal news recommendation method and device based on a multi-head self-attention neural mechanism.
  • news recommendation aims to solve the problem of information overload by making personalized recommendations to users through the computer's powerful computing power and high-efficiency feature matching.
  • news recommendation methods there are mainly two forms of news recommendation methods: (1) filtering based on collaboration; (2) filtering based on content.
  • Content-based filtering This method uses news information (title, text, category) to construct news features, and constructs user portraits by analyzing historical behavior information; when generating predictions, it puts more emphasis on the analysis of item attributes.
  • the content-based filtering recommendation relies on user portraits, and user portraits are obtained from items evaluated by users, and the items most relevant to users' positive evaluations will be recommended to users.
  • content-based filtering uses different models to find similarities between texts, and simulates the relationship between different texts in the corpus; then learns the basic model through statistical analysis or machine learning to generate recommendation results.
  • the content-based filtering user portraits are independent of each other, and the changes of user portraits with the migration of user interests are relatively timely, but sufficient understanding of news characteristics is required.
  • scheme (1) mainly uses user behavior information, and recommends to users by analyzing the similarity relationship between users and items. This method ignores the importance of news text information in news recommendation, resulting in the inability to effectively integrate users and news information, which has certain limitations.
  • Solution (2) adopts the mainstream content-based filtering method in the current news recommendation field, which can better capture the feature information of news, and user portraits are independent, and can quickly respond to user interest migration caused by changes in user behavior.
  • this program requires a high understanding of project information, and news modeling and user modeling are not accurate.
  • the present disclosure aims to solve one of the technical problems in the related art at least to a certain extent.
  • the first purpose of this disclosure is to propose a multi-modal news recommendation method based on a multi-head self-attention neural mechanism.
  • the second purpose of the present disclosure is to propose a multi-modal news recommendation device based on a multi-head self-attention neural mechanism.
  • the embodiment of the first aspect of the present disclosure proposes a multi-modal news recommendation method based on a multi-head self-attention neural mechanism, including: collecting data information, including news data, feature data, and trace data; based on view-level The multi-component feature cross model of the attention mechanism, the hot news real-time prediction technology of the streaming data, and the multi-modal information fusion technology of the intelligent frame extraction, the data information is fused into a unified news feature; the unified news feature As a model input, through the user interest representation model and combined with the highest future impact strategy, the function of personalized and accurate recommendation is completed.
  • the collecting data information also includes collecting interest tags of users.
  • the multimodal information fusion technology includes:
  • image and audio data are converted into text data using image recognition and speech recognition technologies, respectively.
  • the user interest representation model includes:
  • a multi-head self-attention neural mechanism is used to capture potential connections between news
  • the future highest impact strategy includes:
  • the timeliness weight is assigned to the information of each news, and according to a large number of experimental data, the invalidation threshold of the news is stipulated.
  • the embodiment of the second aspect of the present disclosure proposes a multi-modal news recommendation device based on a multi-head self-attention neural mechanism, including a module: an information collection module, used to collect data information, including news data, feature data, Trace data; feature building module, used for multi-component feature cross model based on view-level attention mechanism, hot news real-time prediction technology for streaming data, and multi-modal information fusion technology for intelligent frame extraction, to fuse the data information A unified news feature; a personalized precise recommendation module, which is used to input the unified news feature as a model, and complete the function of personalized precise recommendation through the user interest representation model and the highest future impact strategy.
  • a module an information collection module, used to collect data information, including news data, feature data, Trace data
  • feature building module used for multi-component feature cross model based on view-level attention mechanism, hot news real-time prediction technology for streaming data, and multi-modal information fusion technology for intelligent frame extraction, to fuse the data information
  • a personalized precise recommendation module which is used to input the
  • the information collection module is also used to collect interest tags of users.
  • the multimodal information fusion technology includes:
  • image and audio data are converted into text data using image recognition and speech recognition technologies, respectively.
  • the user interest representation model includes:
  • a multi-head self-attention neural mechanism is used to capture potential connections between news
  • the future highest impact strategy includes:
  • the timeliness weight is assigned to the information of each news, and according to a large number of experimental data, the invalidation threshold of the news is stipulated.
  • FIG. 1 is a schematic flowchart of a multi-modal news recommendation method based on a multi-head self-attention neural mechanism provided by an embodiment of the present disclosure.
  • FIG. 2 is a schematic flowchart of a multi-modal news recommendation device based on a multi-head self-attention neural mechanism provided by an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of an overall solution architecture provided by an embodiment of the present disclosure.
  • Fig. 4 is a schematic diagram of news data information provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a feature intersection model provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a real-time hot news prediction technology provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of a multimodal information fusion route provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a personalized and precise recommendation architecture provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of new user recommendation provided by an embodiment of the present disclosure.
  • Fig. 10 is a schematic diagram of the multi-head self-attention neural mechanism provided by the embodiment of the present disclosure.
  • This disclosure proposes a multi-modal news recommendation method based on a multi-head self-attention neural mechanism, which is used to solve the problems of how to extract and model high-order features, user cold start, and feature extraction of multi-modal data.
  • FIG. 1 is a schematic flowchart of a multi-modal news recommendation method based on a multi-head self-attention neural mechanism provided by an embodiment of the present disclosure.
  • the multi-modal news recommendation method based on the multi-head self-attention neural mechanism includes the following steps S101 to S103.
  • S101 Collect data information, including news data, feature data, and trace data.
  • trace data mainly refers to the historical behavior records left by users when browsing information on the news platform. These data are generated when users browse news, including user browsing records, browsing time, time stamps, etc. interest, which is necessary for making recommendations to users.
  • the feature data mainly includes the user's statistical information (gender, age, interest tags, etc.), which is generated when the user registers information. This enables the platform to first grasp some characteristic information of the new user when they browse news on the platform.
  • the cold start problem of new users is a key issue in the field of news recommendation and even recommendation systems. Through the collection of user registration information, early portraits of users can be constructed, thereby effectively alleviating the cold start problem.
  • News data collection is multi-dimensional.
  • this solution is based on conventional data, and the information collection part also collects the user's interest tags to meet the personalized and accurate recommendation function of each user.
  • S102 Multi-component feature cross model based on view-level attention mechanism, hot news real-time prediction technology of streaming data, and multi-modal information fusion technology of intelligent frame extraction, to fuse data information into a unified news feature.
  • the feature construction part is used to provide news features for the final personalized recommendation. Because news information is multi-modal and multi-component, it is necessary to integrate these multi-dimensional data into a unified news feature.
  • the feature construction part transforms the original data into the input required by the model, and pursues the objectivity and accuracy of the features.
  • This solution designs a multi-component feature cross model based on the view-level attention mechanism, a real-time hot news prediction technology based on streaming data, and a multi-modal information fusion technology based on intelligent frame extraction.
  • the feature intersection model is shown in Figure 5. The model constructs the features of the three components, learns the attention representation of the three components through the view-level attention mechanism, and establishes the feature cross system of each component, so as to make full use of the hidden information of each component and learn more accurate news representation .
  • This technology is mainly divided into five parts, offline model training, streaming model conversion, streaming model training, streaming model evaluation, and streaming model prediction.
  • Offline model training mainly converts the news data set into a binary classification problem through data preprocessing, and trains through the logistic regression model; after the training is completed, it is converted into a streaming model through model conversion.
  • the streaming model can perform real-time training on streaming data and can be set Training time interval; after converting to a streaming model, perform online machine learning algorithm (FTRL) training process through streaming training data, and generate a PMML model.
  • the file in this part is exported to include the parameter configuration of the model; after training the model, pass the streaming evaluation
  • the data measures the performance of the model and provides timely feedback. If the evaluation effect is good, hot spots can be predicted for real-time news data.
  • the solution uses intelligent frame extraction technology to divide the video into images. This method can quickly capture key frames in a video and discard useless frame fragments, save computing resources, and improve the accuracy of multi-modal conversion.
  • image and audio data mature image recognition and speech recognition technologies are used to convert the two modal data into text data to realize the unification of multi-modal data, and the final text data is used as the model input in the form of word embedding.
  • News The multimodal data are integrated to preserve news latent features.
  • personalized accurate recommendation takes the integrated news and user characteristics as model input, mines high-level cross-relationships in the data through deep learning models or other algorithms, and combines some advanced strategies to complete the function of personalized and accurate recommendation for users. It is mainly divided into three parts, the new user portrait building block based on tags, the user interest representation model, and the strategy with the highest impact in the future.
  • the scheme determines the comprehensive recommendation scheme for users; in terms of user groups, the scheme covers new and old users, effectively alleviating the cold start problem of users; in terms of data mode, the scheme adopts video, audio, image Multi-modal feature construction technology; in terms of user interest representation, a multi-head self-attention neural mechanism is used to mine high-order hidden features in data; in terms of personalized recommendation, based on the highest future impact strategy, different weights are given to news features from the perspective of timeliness.
  • the architecture of the model is shown in Figure 8.
  • this module combines the interest tags to design the basic technical solution for the construction of new user portraits, as shown in Figure 9.
  • the scheme establishes early user portraits for users based on the interest tags and statistical information collected during user registration; in the part of news portraits, the subject extraction of text data is carried out through topic models, and then combined with category information to generate news portraits; through The comparison of the similarity between the two and the combination of the information value decay strategy determine the click probability between the user and the news, and finally generate top_k recommendations for new users according to the probability ranking. Even for new users, the scheme can present personalized and timely recommendation effects, and has a good effect.
  • the user's historical behavior records contain rich user preference feature information.
  • the user interest representation model fully perceives and mines the user's historical preference characteristics based on the user's historical behavior records, so as to better locate the user interest representation and improve the accuracy of news recommendation prediction.
  • This module It is mainly based on the multi-head self-attention neural mechanism. The specific flow chart is shown in Figure 10.
  • the multi-head self-attention neural mechanism can capture the relationship between any words to better model news; in terms of user coding, the user history is also used.
  • the self-attention mechanism obtains the potential connection between news, so as to better perceive and mine the user's interest characteristics.
  • the model uses an attention mechanism to determine the weight of each word or each news, so as to better distinguish the contribution of distinctive features to the modeling.
  • the scheme introduces the strategy with the highest impact in the future based on the self-attention mechanism.
  • the strategy with the highest impact in the future is to assign different weights to features based on time series information, and the newer the content, the higher the weight.
  • the scheme assigns the timeliness weight to each piece of news information through the time of news generation, and specifies the time threshold for news failure based on a large amount of experimental data, so as to filter out failures, failures, and Read news with low value to further improve the user's reading experience.
  • the multi-modal news recommendation method based on the multi-head self-attention neural mechanism proposed by the embodiments of the present disclosure in terms of new user recommendation, based on interest tags, through topic models and feature similarity technology, user portraits in the early stage are established to effectively alleviate user coldness. Start-up problem; In terms of personalized and accurate recommendation of users with behavior records, feature collection and fusion of multi-modal information in news through multi-modal information fusion, high-order cross-feature mining and user interest representation through multi-head self-attention mechanism Learning, and finally through the highest future impact strategy and real-time news hotspot mining, give news time series weights and participate in the final user recommendation.
  • the overall program architecture of this program is shown in Figure 3.
  • the advantage of this proposal is that it adopts a multi-head self-attention neural mechanism, which can capture the potential features between any word level and any news, which is helpful for more accurate news modeling and user modeling; Aiming at the problem of user cold start, a label-based early user portrait creation scheme is proposed. With the help of topic model and feature similarity, combined with real-time hot news prediction, early personalized recommendation is made for new users; multi-modal news data is integrated, and the proposed An innovative multi-modal information fusion technology is developed, and the multi-component information of news is integrated, and feature learning is performed on titles, texts, categories, etc., which further improves the accuracy of news modeling.
  • the present disclosure also proposes a multi-modal news recommendation device based on a multi-head self-attention neural mechanism.
  • FIG. 2 is a schematic structural diagram of a multi-modal news recommendation device based on a multi-head self-attention neural mechanism provided by an embodiment of the present disclosure.
  • the multi-modal news recommendation device based on the multi-head self-attention neural mechanism includes: an information collection module 10 , a feature construction module 20 , and a personalized accurate recommendation module 30 .
  • the information collection module is used to collect data information, including news data, feature data, and trace data;
  • the feature construction module is used for the multi-component feature cross model based on the view-level attention mechanism, and the hot news real-time prediction technology of streaming data , multi-modal information fusion technology of intelligent frame extraction, which fuses data information into unified news features;
  • the personalized and accurate recommendation module is used to input unified news features as model input, characterize the model through user interests and combine the highest future impact strategy, Complete the function of personalized and precise recommendation.
  • the information collection module 10 is also used to collect interest tags of users.
  • the multimodal information fusion technology includes:
  • image and audio data are converted into text data using image recognition and speech recognition technologies, respectively.
  • the user interest representation model includes:
  • a multi-head self-attention neural mechanism is used to capture potential connections between news
  • the user interest characterization model, the highest impact strategy in the future includes:
  • the timeliness weight is assigned to the information of each news, and according to a large number of experimental data, the invalidation threshold of the news is stipulated.
  • the multi-modal news recommendation device based on the multi-head self-attention neural mechanism proposed by the embodiments of the present disclosure uses topic models and feature similarity technology to establish early-stage user portraits, effectively alleviating user coldness. Start-up problem; In terms of personalized and accurate recommendation of users with behavior records, feature collection and fusion of multi-modal information in news through multi-modal information fusion, high-order cross-feature mining and user interest representation through multi-head self-attention mechanism Learning, and finally through the highest future impact strategy and real-time news hotspot mining, give news time series weights and participate in the final user recommendation.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提出一种基于多头自注意神经机制的多模态新闻推荐方法及装置,其中方法包括步骤,采集数据信息,包括新闻数据、特征数据、痕迹数据;基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。

Description

基于多头自注意神经机制的多模态新闻推荐方法及装置
相关申请的交叉引用
本申请基于申请号为202111227971.6、申请日为2021年10月21日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开属于人工智能领域,具体涉及一种基于多头自注意神经机制的多模态新闻推荐方法及装置。
背景技术
在信息爆炸和快节奏化的今天,越来越多的用户通过线上阅读的方式获取知识和信息。为帮助用户在有限时间内发现正确且相关的内容,新闻推荐技术应运而生。新闻推荐旨在通过计算机强大的计算能力和高效率的特征匹配对用户进行个性化推荐,从而解决信息过载问题。目前新闻推荐的方法主要有两种形式:(1)基于协同的过滤;(2)基于内容的过滤。
(1)基于协同的过滤。利用兴趣相投、拥有共同经验群体的喜好来推荐用户感兴趣的信息,个人通过合作机制给予信息相当程度的回应(如评分)并记录下来以达到过滤目的,进而帮助别人筛选信息。该方法主要是通过行为历史的用户-物品交互信息来挖掘用户和物品间的关系,从而给用户推荐和他们喜欢物品相似的物品,即所谓的“物以类聚”。基于协同的过滤考虑个性化、自动化程度高、能够有效利用其他相似用户的回馈信息、加快个性化学习的速度。
(2)基于内容的过滤。该方法利用新闻信息(标题、正文、种类)来构建新闻特征,通过分析历史行为信息构建用户画像;在生成预测时,其更多强调对项目属性的分析。当被推荐对象是新闻等文本类型时,效果较好。基于内容的过滤推荐时依赖用户画像,而用户画像从用户评估过的项目中获取,与用户的积极评价最相关的项目会被推荐给用户。为了生成有意义的推荐结果,基于内容的过滤会使用不同模型来查找文本间的相似性,在语料库中模拟不同文本的关系;之后通过统计分析或机器学习来学习基础模型,生成推荐结果。基于内容的过滤用户画像间相互独立,且用户画像随用户兴趣迁移的变化比较及时,但需要对新闻特征足够了解。
其中,方案(1)主要采用用户的行为信息,通过分析用户和物品间的相似度关系,给用户进行推荐。该方法在新闻推荐上忽略了新闻文本信息的重要程度,导致用户和新闻信息无法有效整合,具有一定的局限性。方案(2)采用了当下新闻推荐领域主流的基于内容过滤的方法,能更好捕捉新闻的特征信息,并且用户画像之间是独立的,可以快速响应由于用户行为变化而造成的用户兴趣迁移。但该方案对项目信息的了解要求较高,新闻建模和用户建模不准确。
发明内容
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本公开的第一个目的在于提出一种基于多头自注意神经机制的多模态新闻推荐方法。
本公开的第二个目的在于提出一种基于多头自注意神经机制的多模态新闻推荐装置。
为达上述目的,本公开第一方面实施例提出了一种基于多头自注意神经机制的多模态新闻推荐方法,包括:采集数据信息,包括新闻数据、特征数据、痕迹数据;基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
进一步地,在本公开的一个实施例中,所述采集数据信息还包括对用户的兴趣标签进行采集。
进一步地,在本公开的一个实施例中,所述多模态信息融合技术,包括:
对于视频数据,采用智能抽帧技术,将视频分割为图像;
对于图像和音频数据,分别采用图像识别和语音识别技术,将图像和音频数据转换成文本数据。
进一步地,在本公开的一个实施例中,所述用户兴趣表征模型,包括:
在新闻编码方面,采用多头自注意神经机制捕捉任意单词间的关系;
在用户编码方面,采用多头自注意神经机制捕获取新闻间的潜在联系;
之后,采用注意机制确定每个单词或每个新闻的权重。
进一步地,在本公开的一个实施例中,所述未来最高影响策略,包括:
通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并根据大量的实验数据,规定新闻的失效阈值。
为达上述目的,本公开第二方面实施例提出了一种基于多头自注意神经机制的多模态新闻推荐装置,包括模块:信息采集模块,用于采集数据信息,包括新闻数据、特征数据、痕迹数据;特征构建模块,用于基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;个性化精准推荐模块,用于将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
进一步地,在本公开的一个实施例中,所述信息采集模块,还用于对用户的兴趣标签进行采集。
进一步地,在本公开的一个实施例中,所述多模态信息融合技术,包括:
对于视频数据,采用智能抽帧技术,将视频分割为图像;
对于图像和音频数据,分别采用图像识别和语音识别技术,将图像和音频数据转换成文本数据。
进一步地,在本公开的一个实施例中,所述用户兴趣表征模型,包括:
在新闻编码方面,采用多头自注意神经机制捕捉任意单词间的关系;
在用户编码方面,采用多头自注意神经机制捕获取新闻间的潜在联系;
之后,采用注意机制确定每个单词或每个新闻的权重。
进一步地,在本公开的一个实施例中,所述未来最高影响策略,包括:
通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并根据大量的实验数据,规定新闻的失效阈值。
附图说明
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为本公开实施例所提供的一种基于多头自注意神经机制的多模态新闻推荐方法的流程示意图。
图2为本公开实施例所提供的一种基于多头自注意神经机制的多模态新闻推荐装置的流程示意图。
图3为本公开实施例所提供的总体方案架构示意图。
图4为本公开实施例所提供的新闻数据信息示意图。
图5为本公开实施例所提供的特征交叉模型示意图。
图6为本公开实施例所提供的实时热点新闻预测技术示意图。
图7为本公开实施例所提供的多模态信息融合路线示意图。
图8为本公开实施例所提供的个性化精准推荐架构示意图。
图9为本公开实施例所提供的新用户推荐示意图。
图10为本公开实施例所提供的多头自注意神经机制示意图。
具体实施方式
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
下面参考附图描述本公开实施例的基于多头自注意神经机制的多模态新闻推荐方法和装置。
本申请要解决的技术问题主要有三个:对于高阶特征如何进行提取并建模问题,从而进行更精准的新闻和用户建模;用户冷启动问题,对于新用户也可以进行个性化的推荐;多模态数据的特征提取问题,建立多模态信息融合技术来处理新闻数据中的各模态数据。
本公开提出一种基于多头自注意神经机制的多模态新闻推荐方法,用于解决对于高阶特征如何进行提取并建模、用户冷启动、多模态数据的特征提取问题。
图1为本公开实施例所提供的一种基于多头自注意神经机制的多模态新闻推荐方法的流程示意图。
如图1所示,该基于多头自注意神经机制的多模态新闻推荐方法包括以下步骤S101至步骤S103。
S101:采集数据信息,包括新闻数据、特征数据、痕迹数据。
其中,痕迹数据主要指用户在新闻平台上浏览信息时留下的历史行为记录,这些数据在用户浏览新闻时产生,包括用户的浏览记录、浏览时长、时间戳等,这些数据反映了用户的浏览兴趣,对于向用户进行推荐是必要的。特征数据主要包括用户的统计信息(性别、年龄、兴趣标签等),这些信息在用户注册信息时产生。这使得新用户在平台浏览新闻时,平台可以先掌握新用户的部分特征信息。新用户的冷启动问题是新闻推荐乃至推荐系统领域的重点问题,通过对用户注册信息的采集,可以对用户进行早期的画像构建,从而有效的缓解冷启动问题。新闻数据采集是多维度的,目前的新闻信息除了文本外还有图片、视频等模态信息,这些数据对于新闻的特征表达和建模都有帮助。因此为了更精准推荐,本方案采集新闻的多模态信息进行联合构建;此外,在新闻信息的成份上,采集新闻的正文、标题、种类、实体等信息。图4是新闻数据采集的信息情况。
进一步地,不同于一般新闻推荐的数据采集,本方案在常规数据基础上,信息采集部分还对用户的兴趣标签等进行采集以满足每个用户的个性化精准推荐功能。
S102:基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将数据信息融合成统一的新闻特征。
其中,特征构建部分用于为最终的个性化推荐提供新闻的特征。因为新闻信息为多模态、多成分,需要将这些多维数据融合成统一的新闻特征。特征构建部分将原始数据转化成模型所需要的输入,并追求特征的客观和准确。本方案设计了基于view-level注意力机制的多成分特征交叉模型、基于流式数据的热点新闻实时预测技术、基于智能抽帧的多模态信息融合技术。
考虑到新闻数据具有多种成分,对新闻表示有帮助,并且新闻不同成分具有不同特征,如新闻标题简短扼要而正文长而具体,因此将新闻的多种成分进行特征交叉可进一步提高新闻建模的准确性。特征交叉模型如图5所示。模型将三种成分进行特征构造,通过view-level的注意力机制通学习三种成分的注意力表示,建立各成分的特征交叉体系,从而充分利用各成分隐藏信息,学习到更精确的新闻表示。
新闻推荐领域有较强的头部效应,重要新闻往往会被大部分人看到,如何在大量新闻中快速挖掘热点新闻往往决定新闻质量。通过快速找出热点新闻可以快速向用户推送从而提高用户体验,另一方面实时热点新闻预测也可以缓解用户端冷启动问题。本方案具体的实时热点新闻预测技术如图6。
本技术主要分为五部分,离线模型训练、流式模型转换、流式模型训练、流式模型 评估、流式模型预测。离线模型训练主要将新闻数据集通过数据预处理转换成二分类问题,通过逻辑回归模型进行训练;训练完成后通过模型转换转换成流式模型,流式模型可对流式数据进行实时训练并且可以设置训练时间间隔;转换成流式模型后通过流式训练数据进行在线机器学习算法(FTRL)训练过程,并生成PMML模型,本部分的文件导出包含模型的各参数配置;训练模型后通过流式评估数据衡量模型的表现情况,进行及时反馈,如果评估效果良好,即可对实时新闻数据进行热点预测。
随着新闻领域的发展,新闻的表现形式已不再局限于纯文本形式。针对新闻数据中有音频、视频、图像、文本等多模态数据,本方案提出多模态信息挖掘结果的统一整理与表达,融合技术如图7所示。
对于视频数据,方案采用智能抽帧技术,将视频分割为图像。该方法可以将视频中的关键帧快速捕捉并放弃重复无用的帧片段,节省计算资源,提高多模态转换的准确度。对于图像和音频数据分别采用成熟的图像识别和语音识别技术,将两种模态数据转换成文本数据,实现多模态数据的统一,将最终的文本数据通过词嵌入的形式作为模型输入,新闻的多模态数据被集成从而保留新闻潜在特征。
S103:将统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
其中,个性化精准推荐通过将整合好的新闻和用户特征作为模型输入,通过深度学习模型或者其他算法挖掘数据中高阶交叉关系,并结合一些先进策略,完成对用户个性化精准推荐的功能。主要分为三部分,基于标签的新用户画像构建模块、用户兴趣表征模型、未来影响最高策略。在以上技术和模型支撑的基础上,方案确定了用户综合推荐方案;在用户人群方面,方案覆盖新老用户,有效缓解用户冷启动问题;在数据模态方面,方案采用视频、音频、图像的多模态特征构建技术;在用户兴趣表征方面,采用多头自注意神经机制挖掘数据中的高阶隐藏特征;在个性化推荐方面,基于最高未来影响策略,从时效性出发赋予新闻特征不同权重。模型的架构如图8所示。
在新闻推荐系统中,用户冷启动是一个不可忽视的问题。所有用户在平台上的角色都由新用户过渡,因此如何对没有历史行为的新用户进行个性化推荐是反映个性化推荐技术成熟度的关键指标。为解决冷启动问题,本模块结合兴趣标签,设计了新用户画像构建的基本技术方案,如图9所示。
在用户画像部分,方案基于用户注册时采集的兴趣标签和统计信息对用户进行早期用户画像的建立;在新闻画像部分,通过主题模型对文本数据进行主题提取,再结合类别信息生成新闻画像;通过两者相似度的比较并结合信息价值衰减策略确定用户和新闻间的点击概率,最终根据概率排序对新用户生成top_k推荐。该方案即使在新用户中,推荐效果也能呈现个性化和及时性,具有良好效果。
用户历史行为记录包含丰富的用户偏好特征信息,用户兴趣表征模型基于用户的历史行为记录充分感知与挖掘用户的历史偏好特征,从而更好定位用户兴趣表征,提高新闻推荐预测的准确性,该模块主要基于多头自注意神经机制进行。具体流程图如图10 所示。
本方案采用目前流行的多头自注意神经机制,在新闻编码方面,多头自注意神经机制可以捕捉任意单词间的关系从而更好地进行新闻建模;在用户编码方面,将用户历史记录同样采用多头自注意机制获取新闻间的潜在联系,从而更好地感知和挖掘用户的兴趣特征。在多头自注意层之后,模型采用注意机制用来确定每个单词或者每个新闻的权重,从而更好地区分特征鲜明的要素对建模的贡献。
由于新闻数据更新迭代快时效性高的特点,方案在自注意机制的基础上引入未来影响最高策略。未来影响最高策略即基于时序信息赋予特征不同的权重,越新的内容权重越高。在实际业务场景中,一般新闻的时序越晚,新闻的推荐价值越高。基于此,方案通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并且根据大量的实验数据,规定新闻失效的时间阈值,从而为用户在个性化推荐的基础上过滤掉失效、阅读价值低的新闻,进一步提高用户的阅读体验。
本公开实施例提出的基于多头自注意神经机制的多模态新闻推荐方法,在新用户推荐方面,基于兴趣标签,通过主题模型和特征相似度技术,建立早期阶段的用户画像,有效缓解用户冷启动问题;在有行为记录的用户个性化精准推荐方面,通过多模态信息融合对新闻中的多模态信息进行特征采集和融合,通过多头自注意机制进行高阶交叉特征挖掘和用户兴趣表征学习,最后通过最高未来影响策略和实时新闻热点挖掘为新闻赋予时序权重,参与最终的用户推荐。本方案的总体方案架构如图3所示。
与现有技术相比,本提案的优势在于:采用了多头自注意神经机制,可以捕捉任意单词级以及任意新闻之间的潜在特征,有助于进行更精准的新闻建模和用户建模;针对用户冷启动问题,提出了基于标签的早期用户画像创建方案,借助主题模型和特征相似度,并结合实时热点新闻预测,对新用户进行早期个性化推荐;整合了多模态新闻数据,提出了一种创新的多模态信息融合技术,并整合新闻的多成分信息,对标题、正文、种类等均进行特征学习,进一步提高了新闻建模的准确性。
为了实现上述实施例,本公开还提出一种基于多头自注意神经机制的多模态新闻推荐装置。
图2为本公开实施例提供的一种基于多头自注意神经机制的多模态新闻推荐装置的结构示意图。
如图2所示,该基于多头自注意神经机制的多模态新闻推荐装置包括:信息采集模块10,特征构建模块20,个性化精准推荐模块30。
其中,信息采集模块,用于采集数据信息,包括新闻数据、特征数据、痕迹数据;特征构建模块用于基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将数据信息融合成统一的新闻特征;个性化精准推荐模块用于将统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
进一步地,在本公开的一个实施例中,信息采集模块10,还用于对用户的兴趣标签 进行采集。
进一步地,在本公开的一个实施例中,多模态信息融合技术,包括:
对于视频数据,采用智能抽帧技术,将视频分割为图像;
对于图像和音频数据,分别采用图像识别和语音识别技术,将图像和音频数据转换成文本数据。
进一步地,在本公开的一个实施例中,用户兴趣表征模型,包括:
在新闻编码方面,采用多头自注意神经机制捕捉任意单词间的关系;
在用户编码方面,采用多头自注意神经机制捕获取新闻间的潜在联系;
之后,采用注意机制确定每个单词或每个新闻的权重。
进一步地,在本公开的一个实施例中,用户兴趣表征模型,未来最高影响策略,包括:
通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并根据大量的实验数据,规定新闻的失效阈值。
本公开实施例提出的基于多头自注意神经机制的多模态新闻推荐装置,在新用户推荐方面,基于兴趣标签,通过主题模型和特征相似度技术,建立早期阶段的用户画像,有效缓解用户冷启动问题;在有行为记录的用户个性化精准推荐方面,通过多模态信息融合对新闻中的多模态信息进行特征采集和融合,通过多头自注意机制进行高阶交叉特征挖掘和用户兴趣表征学习,最后通过最高未来影响策略和实时新闻热点挖掘为新闻赋予时序权重,参与最终的用户推荐。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。
尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (13)

  1. 一种基于多头自注意神经机制的多模态新闻推荐方法,其特征在于,包括以下步骤:
    采集数据信息,包括新闻数据、特征数据、痕迹数据;
    基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;
    将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
  2. 根据权利要求1所述的方法,其特征在于,所述采集数据信息还包括对用户的兴趣标签进行采集。
  3. 根据权利要求1所述的方法,其特征在于,所述多模态信息融合技术,包括:
    对于视频数据,采用智能抽帧技术,将视频分割为图像;
    对于图像和音频数据,分别采用图像识别和语音识别技术,将图像和音频数据转换成文本数据。
  4. 根据权利要求1所述的方法,其特征在于,所述用户兴趣表征模型,包括:
    在新闻编码方面,采用多头自注意神经机制捕捉任意单词间的关系;
    在用户编码方面,采用多头自注意神经机制捕获取新闻间的潜在联系;
    之后,采用注意机制确定每个单词或每个新闻的权重。
  5. 根据权利要求1所述的方法,其特征在于,所述未来最高影响策略,包括:
    通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并根据大量的实验数据,规定新闻的失效阈值。
  6. 一种基于多头自注意神经机制的多模态新闻推荐装置,其特征在于,包括以下模块:
    信息采集模块,用于采集数据信息,包括新闻数据、特征数据、痕迹数据;
    特征构建模块,用于基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;
    个性化精准推荐模块,用于将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
  7. 根据权利要求6所述的装置,其特征在于,所述信息采集模块,还用于对用户的兴趣标签进行采集。
  8. 根据权利要求6所述的装置,其特征在于,所述多模态信息融合技术,包括:
    对于视频数据,采用智能抽帧技术,将视频分割为图像;
    对于图像和音频数据,分别采用图像识别和语音识别技术,将图像和音频数据转换成文本数据。
  9. 根据权利要求6所述的装置,其特征在于,所述用户兴趣表征模型,包括:
    在新闻编码方面,采用多头自注意神经机制捕捉任意单词间的关系;
    在用户编码方面,采用多头自注意神经机制捕获取新闻间的潜在联系;
    之后,采用注意机制确定每个单词或每个新闻的权重。
  10. 根据权利要求6所述的装置,所述未来最高影响策略,包括:
    通过新闻的产生时间,对每篇新闻的信息进行时效权重的赋予,并根据大量的实验数据,规定新闻的失效阈值。
  11. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行以下步骤:
    采集数据信息,包括新闻数据、特征数据、痕迹数据;
    基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;
    将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
  12. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行以下步骤:
    采集数据信息,包括新闻数据、特征数据、痕迹数据;
    基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;
    将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
  13. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现以下步骤:
    采集数据信息,包括新闻数据、特征数据、痕迹数据;
    基于view-level注意力机制的多成分特征交叉模型、流式数据的热点新闻实时预测技术、智能抽帧的多模态信息融合技术,将所述数据信息融合成统一的新闻特征;
    将所述统一的新闻特征作为模型输入,通过用户兴趣表征模型并结合最高未来影响策略,完成个性化精准推荐的功能。
PCT/CN2022/087220 2021-10-21 2022-04-15 基于多头自注意神经机制的多模态新闻推荐方法及装置 WO2023065618A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111227971.6 2021-10-21
CN202111227971.6A CN114154054A (zh) 2021-10-21 2021-10-21 基于多头自注意神经机制的多模态新闻推荐方法及装置

Publications (1)

Publication Number Publication Date
WO2023065618A1 true WO2023065618A1 (zh) 2023-04-27

Family

ID=80458451

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087220 WO2023065618A1 (zh) 2021-10-21 2022-04-15 基于多头自注意神经机制的多模态新闻推荐方法及装置

Country Status (2)

Country Link
CN (1) CN114154054A (zh)
WO (1) WO2023065618A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312542A (zh) * 2023-11-29 2023-12-29 泰山学院 基于人工智能的阅读推荐方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154054A (zh) * 2021-10-21 2022-03-08 北京邮电大学 基于多头自注意神经机制的多模态新闻推荐方法及装置
CN115964560B (zh) * 2022-12-07 2023-10-27 南京擎盾信息科技有限公司 基于多模态预训练模型的资讯推荐方法及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120330936A1 (en) * 2011-06-22 2012-12-27 International Business Machines Corporation Using a dynamically-generated content-level newsworthiness rating to provide content recommendations
CN111444428A (zh) * 2020-03-27 2020-07-24 腾讯科技(深圳)有限公司 基于人工智能的信息推荐方法、装置、电子设备及存储介质
CN111488931A (zh) * 2020-04-10 2020-08-04 腾讯科技(深圳)有限公司 文章质量评估方法、文章推荐方法及其对应的装置
CN114154054A (zh) * 2021-10-21 2022-03-08 北京邮电大学 基于多头自注意神经机制的多模态新闻推荐方法及装置
CN114693397A (zh) * 2022-03-16 2022-07-01 电子科技大学 一种基于注意力神经网络的多视角多模态商品推荐方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120330936A1 (en) * 2011-06-22 2012-12-27 International Business Machines Corporation Using a dynamically-generated content-level newsworthiness rating to provide content recommendations
CN111444428A (zh) * 2020-03-27 2020-07-24 腾讯科技(深圳)有限公司 基于人工智能的信息推荐方法、装置、电子设备及存储介质
CN111488931A (zh) * 2020-04-10 2020-08-04 腾讯科技(深圳)有限公司 文章质量评估方法、文章推荐方法及其对应的装置
CN114154054A (zh) * 2021-10-21 2022-03-08 北京邮电大学 基于多头自注意神经机制的多模态新闻推荐方法及装置
CN114693397A (zh) * 2022-03-16 2022-07-01 电子科技大学 一种基于注意力神经网络的多视角多模态商品推荐方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312542A (zh) * 2023-11-29 2023-12-29 泰山学院 基于人工智能的阅读推荐方法及系统
CN117312542B (zh) * 2023-11-29 2024-02-13 泰山学院 基于人工智能的阅读推荐方法及系统

Also Published As

Publication number Publication date
CN114154054A (zh) 2022-03-08

Similar Documents

Publication Publication Date Title
WO2023065618A1 (zh) 基于多头自注意神经机制的多模态新闻推荐方法及装置
CN112163122B (zh) 确定目标视频的标签的方法、装置、计算设备及存储介质
KR102444712B1 (ko) 다중-모달리티 특징 융합을 통한 퍼스널 미디어 자동 재창작 시스템 및 그 동작 방법
CN116702737B (zh) 文案生成方法、装置、设备、存储介质及产品
CN115114395B (zh) 内容检索及模型训练方法、装置、电子设备和存储介质
Jain et al. Video captioning: a review of theory, techniques and practices
CN109684548B (zh) 一种基于用户图谱的数据推荐方法
CN112307336B (zh) 热点资讯挖掘与预览方法、装置、计算机设备及存储介质
Maybury Multimedia information extraction: Advances in video, audio, and imagery analysis for search, data mining, surveillance and authoring
CN112699295A (zh) 一种网页内容推荐方法、装置和计算机可读存储介质
CN105005616A (zh) 基于文本图片特征交互扩充的文本图解方法及系统
CN116975615A (zh) 基于视频多模态信息的任务预测方法和装置
Kovacs et al. Context-aware asset search for graphic design
US20160188595A1 (en) Semantic Network Establishing System and Establishing Method Thereof
Kofler et al. Uploader intent for online video: typology, inference, and applications
Yang Automatic recommendation system of college English teaching videos based on students’ personalized demands
CN109885748A (zh) 基于语意特征的优化推荐方法
CN116010711A (zh) 一种融合用户信息及兴趣变化的kgcn模型电影推荐方法
Panchal et al. The social hashtag recommendation for image and video using deep learning approach
Wen et al. Visual background recommendation for dance performances using dancer-shared images
Liang et al. The research of video resource personalized recommendation system based on education website
CN111223014A (zh) 一种从大量细分教学内容在线生成细分场景教学课程的方法和系统
KR20230051995A (ko) 실감형 확장현실 콘텐츠 관리 플랫폼
CN115130453A (zh) 互动信息生成方法和装置
KR20220113221A (ko) 영상 소스 데이터 거래 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882243

Country of ref document: EP

Kind code of ref document: A1