CN114154054A

CN114154054A - Multi-modal news recommendation method and device based on multi-head self-attention neural mechanism

Info

Publication number: CN114154054A
Application number: CN202111227971.6A
Authority: CN
Inventors: 欧中洪; 刘沛航; 韩宗志; 宋美娜; 钟茂华; 梁昊光
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-03-08
Also published as: WO2023065618A1

Abstract

The invention provides a multi-modal news recommendation method and device based on a multi-head self-attention neural mechanism, wherein the method comprises the steps of collecting data information, including news data, characteristic data and trace data; fusing data information into uniform news characteristics based on a multi-component characteristic cross model of a view-level attention mechanism, a hot spot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction; and inputting the unified news characteristics as a model, and finishing the function of personalized accurate recommendation by combining a user interest representation model and a highest future influence strategy. The proposal effectively relieves the problem of cold start of a user, and performs feature acquisition and fusion on multi-modal information in news through multi-modal information fusion, a multi-head self-attention mechanism performs high-order cross feature mining and user interest characterization learning, and a highest future influence strategy and real-time news hotspot mining endow time sequence weight to the news to participate in final user recommendation.

Description

Multi-modal news recommendation method and device based on multi-head self-attention neural mechanism

Technical Field

The invention belongs to the field of artificial intelligence.

Background

Today, with information explosion and fast paced, more and more users acquire knowledge and information in an online reading manner. To help users find correct and relevant content within a limited time, news recommendation technology is emerging. The news recommendation aims to perform personalized recommendation on users through strong computing power and efficient feature matching of a computer, so that the problem of information overload is solved. The current news recommendation method mainly has two forms: (1) synergy-based filtering; (2) content-based filtering.

(1) Collaborative based filtering. The interest information of the users is recommended by utilizing the preferences of interest-investing and common experience groups, and the individuals give a considerable response (such as scoring) to the information through a cooperation mechanism and record the response so as to achieve the purpose of filtering, thereby helping others to filter the information. The method mainly mines the relationship between the user and the goods through the user-goods interaction information of the behavior history, so as to recommend the goods similar to the goods they like to the user, namely the so-called 'class of goods'. The filtering based on the cooperation considers the individuation, has high automation degree, can effectively utilize feedback information of other similar users and accelerates the speed of the individualized learning.

(2) Content-based filtering. The method utilizes news information (title, text and type) to construct news characteristics, and constructs a user portrait by analyzing historical behavior information; when generating predictions, it emphasizes more of the analysis of item properties. When the recommended object is a text type such as news, the effect is better. Content-based filtering recommendations rely on user profiles that are derived from items that the user has evaluated, and items that are most relevant to the user's positive rating are recommended to the user. In order to generate meaningful recommendation results, content-based filtering uses different models to find similarities between texts, and simulates the relationship of different texts in a corpus; and then, learning the basic model through statistical analysis or machine learning to generate a recommendation result. Content-based filtering user profiles are independent of each other and the user profiles migrate with user interest in a timely manner, but need to be sufficiently known about news characteristics.

The scheme (1) mainly adopts the behavior information of the user, and recommends for the user by analyzing the similarity relation between the user and the articles. The method ignores the importance degree of news text information in news recommendation, so that the user and the news information cannot be effectively integrated, and the method has certain limitation. The scheme (2) adopts a method of content-based filtering in the current news recommendation field, can better capture the characteristic information of news, is independent among user figures, and can quickly respond to the user interest migration caused by user behavior change. However, the scheme has high requirement on understanding of project information, and has the problems of inaccurate news modeling and user modeling.

The technical problem that this application will solve mainly has three: the problem of how to extract and model high-order features is solved, so that more accurate news and user modeling is carried out; the user can carry out personalized recommendation on a new user due to the cold start problem of the user; the method is used for solving the problem of feature extraction of multi-modal data, and a multi-modal information fusion technology is established to process each modal data in the news data.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first purpose of the present invention is to provide a multi-modal news recommendation method based on a multi-head self-attention neural mechanism, which is used for solving the problem of how to extract and model high-order features, cold start of a user, and feature extraction of multi-modal data.

The second purpose of the invention is to provide a multi-modal news recommending device based on a multi-head self-attention neural mechanism.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a multi-modal news recommendation method based on a multi-head self-attention neural mechanism, including: collecting data information including news data, characteristic data and trace data; fusing data information into uniform news characteristics based on a multi-component characteristic cross model of a view-level attention mechanism, a hot spot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction; and inputting the unified news characteristics as a model, and finishing the function of personalized accurate recommendation by combining a user interest representation model and a highest future influence strategy.

According to the multi-modal news recommendation method based on the multi-head self-attention neural mechanism, in the aspect of new user recommendation, based on interest tags, through a topic model and a feature similarity technology, user images in an early stage are established, and the problem of cold start of a user is effectively solved; in the aspect of user personalized accurate recommendation with behavior records, multi-modal information in news is subjected to feature acquisition and fusion through multi-modal information fusion, high-order cross feature mining and user interest characterization learning are carried out through a multi-head self-attention mechanism, and finally time sequence weight is given to the news through a highest future influence strategy and real-time news hotspot mining to participate in final user recommendation.

In addition, the multi-modal news recommendation method based on the multi-head self-attention neural mechanism according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the collecting data information further includes collecting interest tags of the user.

Further, in one embodiment of the present invention, the multi-modal information fusion technique includes:

for video data, an intelligent frame extraction technology is adopted to segment a video into images;

for image and audio data, image recognition and voice recognition techniques are employed, respectively, to convert the image and audio data into text data.

Further, in an embodiment of the present invention, the user interest characterization model includes:

in the aspect of news coding, a multi-head self-attention neural mechanism is adopted to capture the relation between any words;

in the aspect of user coding, a multi-head self-attention neural mechanism is adopted to capture and acquire potential relation among news;

thereafter, an attention mechanism is employed to determine a weight for each word or for each news item.

Further, in one embodiment of the present invention, the future highest impact policy includes:

and giving aging weight to the information of each news according to the generation time of the news, and defining the failure threshold of the news according to a large amount of experimental data.

In order to achieve the above object, a second aspect of the present invention provides a multi-modal news recommendation apparatus based on a multi-headed self-attention neural mechanism, including: the information acquisition module is used for acquiring data information including news data, characteristic data and trace data; the feature construction module is used for fusing the data information into uniform news features based on a multi-component feature cross model of a view-level attention mechanism, a hot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction; and the personalized accurate recommendation module is used for inputting the unified news characteristics as a model and finishing the personalized accurate recommendation function by combining the user interest representation model and the highest future influence strategy.

According to the multi-modal news recommending device based on the multi-head self-attention neural mechanism, in the aspect of recommending new users, based on interest labels, user images in an early stage are established through a topic model and a feature similarity technology, and the problem of cold start of the users is effectively solved; in the aspect of user personalized accurate recommendation with behavior records, multi-modal information in news is subjected to feature acquisition and fusion through multi-modal information fusion, high-order cross feature mining and user interest characterization learning are carried out through a multi-head self-attention mechanism, and finally time sequence weight is given to the news through a highest future influence strategy and real-time news hotspot mining to participate in final user recommendation.

Further, in an embodiment of the present invention, the information collecting module is further configured to collect an interest tag of the user.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of a multi-modal news recommendation method based on a multi-head self-attention neural mechanism according to an embodiment of the present invention.

Fig. 2 is a schematic flowchart of a multi-modal news recommendation apparatus based on a multi-head self-attention neural mechanism according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a general scheme architecture provided by the embodiment of the present invention.

Fig. 4 is a schematic diagram of news data information provided in an embodiment of the present invention.

Fig. 5 is a schematic diagram of a feature intersection model according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a real-time hot news prediction technology provided in an embodiment of the present invention.

Fig. 7 is a schematic diagram of a multi-modal information fusion route provided by the embodiment of the present invention.

Fig. 8 is a schematic diagram of a personalized accurate recommendation architecture according to an embodiment of the present invention.

Fig. 9 is a schematic diagram of new user recommendation provided in the embodiment of the present invention.

Fig. 10 is a schematic diagram of a multi-head attention-directed neural mechanism according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a multi-modal news recommendation method and apparatus based on a multi-head self-attention neural mechanism according to an embodiment of the present invention with reference to the drawings.

As shown in fig. 1, the multi-modal news recommendation method based on the multi-head self-attention neural mechanism includes the following steps:

s101: and collecting data information including news data, characteristic data and trace data.

The trace data mainly refers to historical behavior records left when the user browses information on a news platform, the data are generated when the user browses news and comprise browsing records, browsing duration, timestamps and the like of the user, and the data reflect browsing interests of the user and are necessary for recommending the user. The characteristic data mainly includes statistical information (sex, age, interest tag, etc.) of the user, which is generated when the user registers the information. This allows the platform to first learn some of the characteristic information of the new user when the new user browses news on the platform. The cold start problem of the new user is a key problem in the field of news recommendation and even recommendation systems, and the user registration information is collected, so that the user can be subjected to early portrait construction, and the cold start problem is effectively relieved. The news data acquisition is multidimensional, the current news information comprises image, video and other model information besides text, and the data is helpful for feature expression and modeling of news. Therefore, in order to recommend the news more accurately, the scheme collects multi-mode information of the news and carries out combined construction; in addition, in the composition of news information, information such as the text, title, genre, and entity of news is collected. Fig. 4 is an information situation of news data collection.

Further, different from data acquisition of general news recommendation, the scheme is that on the basis of conventional data, an information acquisition part acquires interest tags and the like of users so as to meet the personalized accurate recommendation function of each user.

S102: and fusing data information into uniform news characteristics based on a multi-component characteristic cross model of a view-level attention mechanism, a hot spot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction.

Wherein the feature construction part is used for providing the features of news for the final personalized recommendation. Because news information is multimodal, multicomponent, it is desirable to fuse these multidimensional data into a unified news signature. The characteristic construction part converts the original data into the input required by the model and pursues the objectivity and accuracy of the characteristics. The scheme designs a multi-component characteristic cross model based on a view-level attention mechanism, a hot spot news real-time prediction technology based on streaming data and a multi-mode information fusion technology based on intelligent frame extraction.

Considering that news data has multiple components, which is helpful for news representation, and different components of news have different characteristics, such as short news headlines and short and long and specific texts, the accuracy of news modeling can be further improved by performing characteristic cross on the multiple components of news. The feature intersection model is shown in fig. 5. The model carries out feature construction on the three components, the attention representation of the three components is learned through a view-level attention mechanism, and a feature cross system of each component is established, so that the hidden information of each component is fully utilized, and more accurate news representation is learned.

The news recommendation field has a strong header effect, important news is often seen by most people, and how to quickly mine hot news in a large amount of news often determines the news quality. The hot news can be quickly pushed to the user by quickly finding out the hot news, so that the user experience is improved, and on the other hand, the cold start problem of the user side can be relieved by real-time hot news prediction. The specific real-time hot news prediction technology of the scheme is shown in fig. 6.

The technology is mainly divided into five parts, namely off-line model training, streaming model conversion, streaming model training, streaming model evaluation and streaming model prediction. Off-line model training mainly converts a news data set into a two-classification problem through data preprocessing, and trains through a logistic regression model; after the training is finished, converting the data into a streaming model through model conversion, wherein the streaming model can train streaming data in real time and can set a training time interval; after the stream model is converted, an online machine learning algorithm (FTRL) training process is carried out through stream training data, a PMML model is generated, and the file export of the part comprises the configuration of each parameter of the model; and measuring the performance condition of the model through the streaming evaluation data after the model is trained, feeding back in time, and if the evaluation effect is good, performing hot spot prediction on the real-time news data.

With the development of the news field, the presentation form of news is no longer limited to the plain text form. Aiming at the fact that the news data contains multi-modal data such as audio, video, images and texts, the scheme provides unified arrangement and expression of multi-modal information mining results, and the fusion technology is shown in figure 7.

For video data, the scheme adopts an intelligent frame extraction technology to segment a video into images. The method can quickly capture the key frames in the video and abandon repeated useless frame segments, thereby saving computing resources and improving the accuracy of multi-modal conversion. And mature image recognition technology and voice recognition technology are respectively adopted for image data and audio data, the two modal data are converted into text data, the multi-modal data are unified, the final text data are used as model input in a word embedding mode, and the multi-modal data of news are integrated, so that the potential characteristics of the news are reserved.

S103: and inputting the uniform news characteristics as a model, and finishing the function of personalized accurate recommendation by combining a user interest representation model and a highest future influence strategy.

The personalized accurate recommendation method comprises the steps of inputting integrated news and user characteristics as models, mining high-order cross relations in data through a deep learning model or other algorithms, and combining some advanced strategies to complete the function of personalized accurate recommendation for users. The method mainly comprises three parts, namely a new user portrait construction module based on the label, a user interest characterization model and a future influence highest strategy. On the basis of the technology and model support, the scheme determines a user comprehensive recommendation scheme; in the aspect of user groups, the scheme covers new and old users, and effectively relieves the problem of cold start of the users; in the aspect of data modality, the scheme adopts a multi-modal feature construction technology of video, audio and images; in the aspect of user interest characterization, a multi-head self-attention neural mechanism is adopted to mine high-order hidden features in data; in the aspect of personalized recommendation, news features are given different weights based on the highest future influence strategy from the aspect of timeliness. The architecture of the model is shown in FIG. 8.

In a news recommendation system, a user's cold start is a non-negligible problem. All the roles of the users on the platform are transited by the new users, so how to perform personalized recommendation on the new users without historical behaviors is a key index for reflecting the maturity of personalized recommendation technology. To solve the cold start problem, the module designs a basic technical scheme for constructing a new user portrait by combining with an interest tag, as shown in fig. 9.

In the user portrait part, the scheme establishes early user portrait for the user based on interest tags and statistical information acquired during user registration; in the news portrait part, theme extraction is carried out on the text data through a theme model, and then the news portrait is generated by combining with the category information; and determining the click probability between the user and the news by comparing the similarity of the two and combining an information value attenuation strategy, and finally generating top _ k recommendation for the new user according to probability sequencing. The proposal can also show individuation and timeliness of the recommendation effect even in a new user, and has good effect.

The user historical behavior record contains rich user preference characteristic information, the user interest representation model fully senses and mines the historical preference characteristics of the user based on the user historical behavior record, so that the user interest representation is better positioned, the accuracy of news recommendation prediction is improved, and the module is mainly based on a multi-head self-attention neural mechanism. The specific flow chart is shown in fig. 10.

The scheme adopts the currently popular multi-head self-attention neural mechanism, and in the aspect of news coding, the multi-head self-attention neural mechanism can capture the relation between any words so as to better perform news modeling; in the aspect of user coding, the user history records are also used for acquiring potential links among news by adopting a multi-head self-attention mechanism, so that the interest characteristics of the user are better perceived and mined. After the multi-headed self-attention layer, the model employs an attention mechanism to determine the weight of each word or each news, thereby better distinguishing the contribution of the distinctive elements to the modeling.

Due to the characteristic of fast iteration and high timeliness of news data updating, the scheme introduces a future influence highest strategy on the basis of a self-attention mechanism. The future highest-impact strategy is to give different weights to the characteristics based on the time sequence information, and the newer content weight is higher. In an actual business scenario, the later the timing sequence of general news, the higher the recommendation value of the news. Based on the scheme, the time-dependent weight is given to the information of each news according to the news generation time, and the time threshold value of news failure is regulated according to a large amount of experimental data, so that news with failure and low reading value is filtered out on the basis of personalized recommendation for the user, and the reading experience of the user is further improved.

According to the multi-modal news recommendation method based on the multi-head self-attention neural mechanism, in the aspect of new user recommendation, based on interest tags, through a topic model and a feature similarity technology, user images in an early stage are established, and the problem of cold start of a user is effectively solved; in the aspect of user personalized accurate recommendation with behavior records, multi-modal information in news is subjected to feature acquisition and fusion through multi-modal information fusion, high-order cross feature mining and user interest characterization learning are carried out through a multi-head self-attention mechanism, and finally time sequence weight is given to the news through a highest future influence strategy and real-time news hotspot mining to participate in final user recommendation. The overall scheme architecture of the scheme is shown in fig. 3.

Compared with the prior art, the proposal has the advantages that: by adopting a multi-head self-attention neural mechanism, potential characteristics of any word level and any news can be captured, and more accurate news modeling and user modeling are facilitated; aiming at the problem of cold start of a user, a scheme for creating an early user portrait based on a label is provided, and early personalized recommendation is performed on a new user by means of a theme model and feature similarity and by combining real-time hot news prediction; the method integrates multi-mode news data, provides an innovative multi-mode information fusion technology, integrates multi-component information of news, performs characteristic learning on titles, texts, categories and the like, and further improves the accuracy of news modeling.

In order to implement the above embodiment, the invention further provides a multi-modal news recommendation device based on the multi-head self-attention neural mechanism.

Fig. 2 is a schematic structural diagram of a multi-modal news recommendation device based on a multi-head self-attention neural mechanism according to an embodiment of the present invention.

As shown in fig. 2, the multi-modal news recommendation apparatus based on the multi-head self-attention neural mechanism includes: the system comprises an information acquisition module 10, a feature construction module 20 and a personalized accurate recommendation module 30.

The information acquisition module is used for acquiring data information including news data, characteristic data and trace data; the feature construction module is used for fusing data information into uniform news features based on a multi-component feature cross model of a view-level attention mechanism, a hot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction; the personalized accurate recommendation module is used for inputting the uniform news characteristics as a model and completing the personalized accurate recommendation function by combining the user interest representation model and the highest future influence strategy.

Further, in an embodiment of the present invention, the information collecting module 10 is further configured to collect interest tags of the users.

Further, in one embodiment of the present invention, a multimodal information fusion technique includes:

Further, in one embodiment of the present invention, a user interest characterization model includes:

Further, in one embodiment of the present invention, the user interest characterization model, the future highest impact strategy, comprises:

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A multi-modal news recommendation method based on a multi-head self-attention neural mechanism is characterized by comprising the following steps:

collecting data information including news data, characteristic data and trace data;

fusing data information into uniform news characteristics based on a multi-component characteristic cross model of a view-level attention mechanism, a hot spot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction;

and inputting the unified news characteristics as a model, and finishing the function of personalized accurate recommendation by combining a user interest representation model and a highest future influence strategy.

2. The method of claim 1, wherein collecting data information further comprises collecting interest tags for a user.

3. The method of claim 1, wherein the multimodal information fusion technique comprises:

4. The method of claim 1, wherein the user interest characterization model comprises:

5. The method of claim 1, wherein the future highest impact policy comprises:

6. A multi-modal news recommendation device based on a multi-head self-attention neural mechanism is characterized by comprising the following modules:

the information acquisition module is used for acquiring data information including news data, characteristic data and trace data;

the feature construction module is used for fusing the data information into uniform news features based on a multi-component feature cross model of a view-level attention mechanism, a hot news real-time prediction technology of streaming data and a multi-mode information fusion technology of intelligent frame extraction;

and the personalized accurate recommendation module is used for inputting the unified news characteristics as a model and finishing the personalized accurate recommendation function by combining the user interest representation model and the highest future influence strategy.

7. The apparatus of claim 6, wherein the information collecting module is further configured to collect interest tags of the users.

8. The apparatus of claim 6, wherein the multimodal information fusion technique comprises:

9. The apparatus of claim 6, wherein the user interest characterization model comprises:

10. The apparatus of claim 6, the future highest impact policy, comprising: