CN116561446B

CN116561446B - Multi-mode project recommendation method, system and device and storage medium

Info

Publication number: CN116561446B
Application number: CN202310834248.7A
Authority: CN
Inventors: 蔡娟娟; 任亚琦; 李传珍; 张洋; 杜怀昌
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-10-20
Anticipated expiration: 2043-07-10
Also published as: CN116561446A

Abstract

The invention provides a multi-modal item recommendation method, a system, equipment and a storage medium, which belong to the technical field of computers, and personalized item recommendation is realized by constructing a recommendation model based on fusion of multi-relationship item different composition and multi-modal preference; aiming at the multi-relation project heterogeneous composition, acquiring project multi-relation heterogeneous characteristics under different modes on the basis of a drawing meaning network; aiming at the user-project bipartite graph, capturing user mode preference information of injection high-order interaction information in different mode user-project bipartite graph, carrying out attention fusion on multiple mode preferences of the user, enhancing multi-mode preference expression learning of the user, and finally improving accuracy of recommendation results. According to the recommendation system and the recommendation method, the problems of data sparsity and insufficient utilization of the multi-modal information in the existing recommendation system are solved by utilizing the multi-modal information and the personalized recommendation algorithm technology, so that recommendation results are more accurate and interpretable, and the operation effect and the use experience of users are improved.

Description

Multi-mode project recommendation method, system and device and storage medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a multi-mode project recommending method, a multi-mode project recommending system, multi-mode project recommending equipment and a storage medium.

Background

With the development of the Internet, the arrival of the information explosion age has completed the transition from information starvation to information overload. The personalized recommendation system relieves the pressure of information overload and helps users to obtain information really helpful to themselves from massive data. The multi-mode information is used as auxiliary information, and is introduced into a recommendation system, so that the problem of data sparsity suffered by a collaborative filtering algorithm can be effectively relieved, the recommendation effect is improved, and the method becomes a research hotspot in the field of the current recommendation system. In addition, various graph structures exist in the multi-mode recommendation scene, and students can capture and learn various complex relations from the graph neural network technology, so that the accuracy and the interpretability of the recommendation system are improved.

The deep neural network benefits from the excellent nonlinear fitting capability, so that not only can deep potential characteristic representations of users and projects be learned, but also complex nonlinear interaction characteristics among the user projects can be learned, and further user preference is analyzed, and therefore the deep neural network is widely focused by students. Traditional deep learning methods such as convolutional neural networks (Convolutional Neural Networks, CNN) and cyclic neural networks (Recurrent Neural Network, RNN) have excellent performance in extracting European spatial data features. Therefore, in multi-modal recommendation based on the conventional neural network, the conventional neural network is generally used as a feature extractor to extract multi-modal features of users and items, and then the user preference representation is constructed by using the historical interaction information of the users, the item attribute information and the multi-modal features. However, the conventional neural network cannot fully mine the rich non-European space diagram data in the multi-modal recommendation scene.

The appearance of the graph neural network provides a new thought for extracting graph data characteristics, solves the problem that the characteristics of non-European data cannot be extracted, and is widely applied to the field of multi-mode recommendation. The graph neural network is essentially a connection model that captures the dependencies between nodes in the graph through the transfer of messages between the nodes in the graph. Through the graph neural network, the recommendation model can more flexibly utilize interaction behavior information of the user and the object to model vector representation of the user and the object, and graph data information in a multi-mode recommendation scene is fully mined, so that the expression capability and the interpretability of the model are improved.

The existing multi-modal recommendation related research based on the graph neural network can be divided into two stages: early research focused on designing complex user-project aggregation strategies for user-project bipartite graphs of different modalities to help model user-project interactions; on the basis of the prior study, projects and users under different modes are further subjected to relation mining and feature modeling by constructing various graph structure data.

The embedded representation of the item and the user is decisive for the recommendation effect of the multimodal recommendation model. In early multi-mode recommendation algorithm research based on the graph neural network, high-order interaction information between a user and a project is captured only through user-project bipartite graphs under different modes, and different graph relations in a multi-mode recommendation scene, such as inter-project relations, inter-user relations and the like, are ignored, so that multi-mode feature representation of the project and the user is imperfect. Aiming at the problem, the prior study carries out multi-mode recommendation study based on a multi-graph neural network by simultaneously utilizing different types of graph data, such as project graphs, user-project bipartite graphs, user graphs and the like, utilizes the multi-graph data to mine potential information between projects and users, and perfects multi-mode embedded representation study of the projects and the users.

After analysis of the existing multi-modal recommendation algorithm based on the graph neural network, two problems are found to exist: firstly, in terms of project representation modeling, when the conventional research is used for project relation mining, the noise problem caused by complex side relation is considered, the project relation is single in use, and polygonal heterogeneous information among projects is not fully mined and utilized; secondly, in the aspect of user representation modeling, in order to reduce information loss, most of the existing researches adopt splicing or summation operation to fuse the user multi-modal characteristic representations, and the weight problem of different modal characteristic representations is ignored. Based on the two problems, the recommendation result of the multi-mode recommendation algorithm in the prior art has the problem of inaccuracy.

Disclosure of Invention

Based on the current situation of the conventional multi-mode recommendation algorithm based on the graph neural network, the invention provides a multi-mode item recommendation method, a system, equipment and a storage medium, which are used for overcoming at least one technical problem in the prior art.

In order to achieve the above object, the present invention provides a multi-modal item recommendation method, including:

constructing a graph through user project interaction data and project multi-mode features to obtain multi-relationship project different graphs and user-project bipartite graphs under different modes;

Drawing characteristics of the multi-relation project iso-graph and the user-project bipartite graph under different modes are extracted, and a project multi-mode representation and a user single-mode preference characteristic set are obtained;

fusing the user single-mode preference feature sets by adopting an attention mechanism technology to obtain a user multi-mode preference feature set;

inputting the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and performing Top-K recommendation on a user through prediction function calculation to obtain a primary model of an MRIH-MPF model;

performing model training on a primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-mode training set to obtain the MRIH-MPF model;

and generating a target item recommendation result for the target user through the MRIH-MPF model according to the acquired target user item interaction data and the target item multi-modal data.

In order to solve the above problems, the present invention further provides a multi-modal item recommendation system, the system comprising:

the composition module is used for constructing a graph through user project interaction data and project multi-mode characteristics to obtain multi-relationship project different compositions and user-project bipartite graphs under different modes;

The feature extraction module is used for extracting graph features of the multi-relation project iso-graph and the user-project bipartite graph under different modes to obtain project multi-mode representation and a user single-mode preference feature set;

the feature fusion module is used for fusing the user single-mode preference feature sets by adopting an attention mechanism technology so as to obtain user multi-mode preference feature sets;

the prediction module is used for inputting the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and performing Top-K recommendation on the user through prediction function calculation to obtain a primary model of the MRIH-MPF model;

the model training module is used for carrying out model training on the primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-mode training set to obtain the MRIH-MPF model;

and the recommendation module is used for inputting the acquired target user item interaction data and the target item multi-mode data into the MRIH-MPF model to generate a target item recommendation result for the target user.

In order to solve the above problems, the present invention also provides an electronic device including:

at least one processor; the method comprises the steps of,

A memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform steps in the multimodal item recommendation method as previously described.

In order to solve the above-mentioned problems, the present invention further provides a computer readable storage medium, in which at least one instruction is stored, which when executed by a processor in an electronic device, implements the above-mentioned multi-modal item recommendation method.

According to the Multi-mode item recommendation method, system and device and storage medium provided by the invention, personalized item recommendation is realized by constructing an MRIH-MPF model (namely, a recommendation model based on fusion of Multi-relation item iso-composition and Multi-mode preference is called as short term of Multi-relation Item Heterogeneous Graph and Multi-modal Preference Fusion Recommendation), and item Multi-relation heterogeneous characteristics under different modes are obtained from the Multi-relation item iso-composition based on the constructed Multi-relation item iso-composition and a user-item bipartite graph; from the user-project two-part diagram, the user mode preference information of the injected high-order interaction information in the user-project two-part diagram of different modes can be captured through a multi-layer diagram convolution network, attention fusion is carried out on the multi-mode preference of the user, multi-mode preference representation learning of the user is enhanced, and finally accuracy of a recommendation result is improved; according to the recommendation system and the recommendation method, the problems of data sparsity and insufficient utilization of the multi-modal information in the existing recommendation system are solved by utilizing the multi-modal information and the personalized recommendation algorithm technology, so that recommendation results are more accurate and interpretable, and the operation effect and the use experience of users are improved.

In addition, the MRIH-MPF model constructed by the invention can be used for recommending IPTV (interactive network television) personalized multi-mode movies or short videos by combining with Web development technology, thereby improving the recommendation accuracy of IPTV movies or short videos.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating a multi-modal project recommendation method according to an embodiment of the present invention;

FIG. 2 is a block diagram of an MRIH-MPF model according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating the construction of a multi-relational project iso-graph according to an embodiment of the present invention;

FIG. 4 is a diagram of user-project bipartite graph and high-level connectivity graph according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of feature extraction of a heterogeneous text modal item provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of a dual-layer attention module according to an embodiment of the present invention;

FIG. 7 is a flowchart of a movie poster crawling process according to an embodiment of the present invention;

FIG. 8 is a block diagram of a recommendation system according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a multi-modal project recommendation system according to an embodiment of the present invention;

fig. 10 is a schematic diagram of an internal structure of an electronic device for implementing a multi-mode item recommendation method according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Based on the problems in the prior art, the invention mainly provides a multi-mode item recommending method, a system, equipment and a storage medium, and the main purpose of the invention is to solve the problems of insufficient data sparsity and multi-mode information utilization in a recommending system in the prior art.

Fig. 1 is a flowchart of a multi-modal item recommendation method according to an embodiment of the invention. The method may be performed by a system, which may be implemented in software and/or hardware.

FIG. 1 depicts a multi-modal item recommendation method in its entirety. As shown in FIG. 1, in the present embodiment, the multi-modal item recommendation method includes steps S110 to S160.

S110, constructing a graph through user project interaction data and project multi-mode features to obtain multi-relationship project different graphs and user-project bipartite graphs under different modes.

Specifically, in the embodiment of the present invention, the user item interaction data and the items in the item multimodal feature may be movie recommendation, short video recommendation, and other recommended items that may be completed according to the acquired user interaction data and the user preference. The method and the device mainly carry out deep learning modeling on the multi-modal representation of the project and the multi-modal preference representation of the user in the model by excavating edge relations in the structural data of various graphs, and the model is used in the project recommendation process after the model is trained. Therefore, it is necessary to construct various graph structure data, specifically including multi-relationship project iso-graphs and user-project bipartite graphs in different modalities.

As an optional embodiment of the invention, constructing the graph through the user project interaction data and the project multi-mode characteristics to obtain the multi-relationship project abnormal graph and the user-project bipartite graph under different modes comprises the following steps:

Extracting interaction information of user project interaction data, extracting multi-modal information of project multi-modal features, and calculating side relation of the project and sampling neighbor nodes through the extracted interaction information and the multi-modal information, so that a project graph under each relation is obtained;

aiming at the project diagrams under each relation, fusing the similar semantic project diagrams of the modes under different modes and the co-occurrence collaborative project diagrams to obtain multi-relation project different patterns under different modes;

embedding interest preference of the user node as a building target according to the user project interaction data, and building a corresponding mode-level user-project bipartite graph for each mode.

Specifically, the invention takes the user-project interaction information and the multi-modal information of the project, which are embodied in the user project interaction data, as input to construct multi-relationship project heterograms containing modal similar semantic information and co-occurrence cooperative information under different modes. The specific construction flow is shown in fig. 3, firstly, user-project interaction information extraction and project multi-mode feature extraction are carried out through data processing, then project side relation calculation and neighbor node sampling are carried out through the extracted interaction information and the multi-mode information, thus obtaining project diagrams under each relation, and finally, the similar semantic project diagrams and co-occurrence collaborative project diagrams under different modes are fused, and project abnormal patterns under each mode are obtained.

Is provided withFor the project composer in the corresponding single mode, in the project composer,/>Representing all item nodes under the modality, including image modality->And text modality/>，/>Representing the edge relationship between two nodes in the modality. The project heterograms under a single mode mainly comprise two types of edges, namely representing co-occurrence cooperative edge relationships among the projects and corresponding similar semantic edge relationships under the single mode, respectively using + -> and />To represent. The whole project heterogeneous graph is obtained by co-occurrence and similarity calculation and neighbor project node Top-K sampling, and a construction algorithm is shown in a table 1. Table 1 shows a multiple relationship project heterogeneous graph construction algorithm.

TABLE 1

Co-occurrence collaborative edge relationship between itemsThe method is obtained by calculating the co-occurrence times among the projects by using the user project interaction data and performing Top-K sampling. If two items frequently appear in pairs in the interaction list of all users, some modal level collaboration information between the items is also hidden in the co-occurrence relationship of such items, and when one item is clicked, the probability of the other item being clicked is also greatly increased. In the embodiment of the invention, the co-occurrence times between every two items are calculated through the user item interaction data, then the co-occurrence times of Top-K are sampled for each item, the noise item with lower co-occurrence times is removed, and the highest co-occurrence degree is reserved KIs a collaborative similar item. And finally, forming a cooperative edge relation between each neighbor item and the target item, wherein the direction points to the target item node from the neighbor item node.

Similar semantic edge relationships between itemsBy using the project features under different modes (obtained through pre-training, which may also be referred to as project pre-training features), the project multi-mode features are also obtained through pre-training, and may specifically include the project features under different modes or the project pre-training features, for example, the project image pre-training features, the project text pre-training features in fig. 2) to calculate the similarity between the project single-mode features and obtain the similarity through Top-K sampling. Some similar semantic information exists among items with higher similarity in different modes, and similar semantic edge relations are obtained from the items with similar mode semantic information. These items with modal semantic similarity content are more easily interesting to users with similar modal preferences, helping to extract item preference information for users with modal content similarity in that modality. In the embodiment of the invention, text merging and word segmentation are carried out on the text part of the project, and a Sentence-Bert pre-training model is adopted (the Sentence-Bert is a twin network based on the pre-trained BERT, so that a chapter vector with enough meaning in terms of semanteme can be obtained) to obtain the text modal pre-training characteristics of the project; inputting the image file of the project into a pre-trained Resnet-50 model on the image set to obtain the visual modality pre-training characteristics. Then, calculating the feature similarity between every two items under the mode through the mode pre-training features of the items, then carrying out Top-K similarity sampling on each item, and reserving the mode with the highest semantic similarity with the current item KItems. And finally, each item obtained by sampling and the current item form a single-mode semantic similar side relation, and the direction points to the current item node from the neighbor item node.

User-project bipartite graphIs constructed by historical interaction data of the user and the project (obtained from the interaction data of the user project), as shown in the left sub-graph of FIG. 4, wherein +.> and />Representing user nodes and item nodes interacted with, respectively, side +.>Representing that there is a connective interaction between pairs of user items. When interest preference embedding of user nodes is taken as a building target, the left sub-graph of the graph of FIG. 4 is expanded to obtain +.>A graph is expanded for the tree structure of vertices. At this time, high-order connectivity means that node is reached from any node in the tree diagram +.>The depth of the path is larger than 1, and rich high-order cooperative interaction information is hidden in the high-order connection path. For example, second order path->←/>←/>Represents->And->Behavioral similarity between them because both users are associated with the item +.>There is an interaction. Third order Path->←/>←/>←/>Then indicate->Would like->Because of its similar user->And (2) with project->There is an interaction. From->Is looking at the overall tree diagram of->Possibly pair->Is greater than +. >Because there are two paths connected</>，/> >But only one path connecting</>，/> >. By introducing the graph neural network into the recommendation system and utilizing an iterative propagation mechanism of the graph neural network, the graph neural network can be effectively extractedAnd high-order interaction information in the historical interaction data is used for improving the model recommendation effect.

In multi-mode recommendation based on the graph neural network, certain difference exists in preference information of the same user in different modes. In order to inject high-order connectivity information into the extraction process of user preference information of different modes, in the embodiment of the invention, a corresponding mode-level user-project bipartite graph is constructed for each mode through historical interaction information of a user (namely user project historical interaction information, from user project interaction data). Mode in the embodiment of the inventionMIncluding visionVAnd textTThe same item node in the two-part diagram of the two modes carries different mode content information, and the same user node also carries different mode preference information.

In the embodiment of the invention, the history interaction information of the user is subjected to the two-tuple<，/> >Construction of-> and />The two groups represent edge relations between the two groups, which are nodes in the user-project bipartite graph respectively, and the graph is constructed as an undirected graph because of the bidirectional relation between the two groups. Two built graphs of different modes are respectively input into a graph convolution network, the communication characteristic of the graph neural network is utilized, the receptive field of nodes in the graph is enlarged by increasing the layer number of graph convolution, and high-order interaction information between users and projects under different modes is effectively captured to model user single-mode preference expression.

And S120, extracting graph characteristics of the multi-relation project iso-graph and the user-project bipartite graph under different modes to obtain a project multi-mode representation characteristic set and a user single-mode preference characteristic set.

Specifically, a multi-relationship project different composition module can be constructed, and the text and image pre-training features of the project can be input into the multi-relationship project different composition under the corresponding mode constructed in the step S110 by using the multi-relationship project different composition moduleAnd extracting the multi-relation characteristics of the project, so as to fully mine hidden information among different relations of the project. In this embodiment, the multi-relationship project heterogeneous graph module mainly comprises two parts, namely node feature aggregation in different relationships of the multi-relationship project heterogeneous graph under the corresponding mode and node feature aggregation among different relationships in the multi-relationship project heterogeneous graph. And finishing information propagation aggregation of the heterograms through intra-relationship aggregation and inter-relationship aggregation, so as to obtain project heterogram features corresponding to different modes, and further obtain a project multi-relationship feature set under multiple modes. A user single-mode preference extraction module may be created. The method is used for inputting multi-relation project features of different modes and user ID information (from user project interaction data and available from a user-project bipartite graph) obtained by a multi-relation project heterogeneous graph feature extraction module, inputting the project multi-relation features into the user-project bipartite graph under corresponding modes to serve as initial representation of project nodes in the bipartite graph, initializing the user ID information by normal distribution to obtain initial representation of the user nodes in the bipartite graph, extracting user single-mode preference through graph convolution operation, and obtaining single-mode representation of injected high-order communication information projects, so that a user single-mode preference feature set is obtained.

As an optional embodiment of the invention, extracting graph features of the multi-relation project iso-graph and the user-project bipartite graph under different modes to obtain a project multi-mode representation feature set and a user single-mode preference feature set comprises:

respectively and sequentially carrying out in-relation aggregation and inter-relation aggregation on multi-relation project heterogeneous graphs under each mode in multi-relation project heterogeneous graphs under different modes to finish information propagation aggregation of heterogeneous graphs, obtaining project heterogeneous graph characteristics corresponding to different modes and obtaining a project multi-relation characteristic set under the multi-modes;

inputting project relation features in a project multi-relation feature set under a multi-mode into a user-project bipartite graph under a corresponding mode to be used as project node initial representation in the bipartite graph, initializing user ID information by adopting normal distribution to obtain user node initial representation in the bipartite graph, and extracting user single-mode preference through graph convolution operation to obtain a user single-mode preference feature set and a project single-mode representation feature set with high-order interaction information;

element-level summation is carried out on the item single-mode representation feature set to obtain an item multi-mode representation feature set; wherein the user ID information is derived from user item interaction data.

Specifically, taking text modal multiple relationship heterogeneous graph feature extraction as an example, the aggregation process is as shown in fig. 1-4, and the visual mode or other modes are the same. The intra-relationship node aggregation is used for aggregating neighbor item node information in a single relationship in the mode to the current node. In order to alleviate the influence of node noise information on project feature modeling in the heterogeneous graph sampling process, in the embodiment, a gating attention mechanism is introduced into the project graph of each type of edge relation, and the edge relation of the current node and the neighbor node is subjected to soft clipping and the neighbor node information is subjected to focused aggregation. Specifically, first, linear splicing and transformation are carried out on node characteristics of a current node and a neighbor node to calculate gating scores, so that soft clipping of the side relationship between the two nodes is realized. And then, obtaining the importance degree of the current node to the neighbor node information by calculating the inner integration numbers of the current node and all neighbor nodes, multiplying the gating score and the attention score to obtain an aggregation weight coefficient, and normalizing the final weight score of all neighbor nodes to obtain the aggregation characteristics of the current node under a single relationship. Taking text semantic relation as an example, a specific aggregation formula is as follows:

, wherein ,/>Representing the central item node characteristics obtained by aggregating neighbor node vectors in the relationship under the current relationship, and +.>Representing the central node characteristics of the items before aggregation under the current relation, < + >>Representing the first node of the center under the current relationshipiFeature vector of each neighbor node, +.>The score given when representing the final aggregated neighbor node features is calculated as follows:

，

wherein ,by gating score->And inter-node attention scoreMultiplying and normalizing. Gating score->The current project node and the neighbor node are linearly spliced and transformed, and then are obtained through an activation functionThe method can perform soft clipping on the side relationship among different nodes to a certain extent, and the attention score among the nodes is +.>The information of the neighbor node which is important to the current node can be obtained by calculating the inner product between the current project node and the neighbor node.

Inter-relationship node aggregation embeds and combines item nodes obtained under different side relationships by introducing an attention mechanism, and the aggregation weight of each side relationship is assumed to be，/>For each edge relationship importance score, final project multiple relationship feature ++>The expression is as follows:

，/>, wherein ,For the corresponding modality finally acquired->The item-multiple relationship feature of the following,

for the different relations in the current modality->Corresponding project node characteristics.

The created user single-mode preference extraction module can mainly use Light-GCN as a graph convolution layer to perform high-order interaction information aggregation of neighbor nodes on a model-level user-project bipartite graph. The Light-GCN consists of a Light-weight graph convolution layer and an embedded combination, wherein the Light-weight graph convolution layer discards two standard operations of feature conversion and nonlinear activation of the traditional graph convolution, so that the parameter number is greatly reduced; the layer embedding combination part gives up the node self-connection operation in the traditional graph convolution, and performs weighted combination on each layer of embedding to serve as the final graph embedding output representation, so that the problem of excessive smoothness of the graph is effectively alleviated. The convolution layer number of the GCN determines the aggregation view size of each vertex, and the first-order to multi-order neighborhood node information of the target node can be aggregated into the current node through iterative propagation of multiple layers of GCNs. The first-order interaction information represents the historical interaction project information of the user, which is most relevant to the interest preference of the user, and the second-order and above high-order interaction information represents the high-order communication information in the two-part graph. The single-mode preference expression of different user nodes is mainly divided into three parts of embedding initialization, single-layer convolution layer information propagation and multi-layer convolution layer information aggregation of user and project bipartite graph nodes.

First, using multiple relationship project heterogeneous graph featuresTo initialize the project initiation layer feature in the user-project bipartite graph>Initializing user ID information by adopting normal distribution to obtain user preference initial characteristics in two graphs of different modal users and projects>. Then, the initialization node is embedded into the user-project bipartite graph +.>Is subject to single-layer convolutional layer propagation.

And finally, aggregating the high-order interaction information in the graph into preference representation embedding of the target user node by utilizing iterative aggregation of Light-GCN to obtain single-mode user interest preference representations output by different convolution layers. The specific single-mode preference feature extraction formula is as follows:

，

wherein , and />Respectively represent the +.>User node representation and project node representation in layer convolution layer,/->For the standard normalization matrix, ++> and />For user node->And his neighbor item node->Is a contiguous node number of (c). By introducing a normalization matrix during the graph convolution, the longer the path length from the target node, the more information that propagates to the target node is curtailed.

In the multi-layer convolution layer information aggregation, information aggregation is carried out on each layer of convolution layer node representation obtained through graph convolution iteration operation to inhibit the overcomplete problem, and finally user single-mode preference representation and project single-mode node representation are obtained. Inspired by the Light-GCN, in the embodiment of the invention, the characteristics of each node are combined through a weighting coefficient to form the final single-mode representation of each user and item. The specific formula is as follows:

，

wherein ,representing the number of layers of the graph convolution operation; layer integration weight->Representing node +.>The features of the layers represent the degree of importance. The higher order interaction information is encoded into the user and item representations in a single mode by weighted summation. The same operation is adopted for the bipartite graph of each mode, and after the transmission on the bipartite graph of different modes, the representation of the user and the item of each mode is finally obtained> and />, wherein />Comprises-> and />Two modes.

S130, fusing the user single-mode preference feature sets by adopting an attention mechanism technology to obtain the user multi-mode preference feature set.

Specifically, a user multimodal preference attention fusion module can be created to take as input the extracted user single modality preference representation. And fusing the user single-mode preference feature sets through the attention mechanism technology, so as to obtain the user multi-mode preference feature sets.

As an optional embodiment of the present invention, fusing the user single-mode preference feature set to obtain the user multi-mode preference feature set includes:

constructing a double-layer attention mechanism, wherein the double-layer attention mechanism comprises a single-user mode preference fusion attention layer and a similar-user multi-mode consistency preference fusion attention layer;

Performing feature cross processing on user single-mode preference features in the user single-mode preference feature set, inputting multi-mode cross preference features of each user obtained by feature cross and the user single-mode preference features into a single-user mode preference fusion attention layer to perform personalized single-user multi-mode preference fusion, and obtaining a first user multi-mode preference feature set;

taking each user in the user single-mode preference feature set as a current user respectively, acquiring K similar users most similar to the current user through a co-occurrence method of sampling Top-K co-occurrence times, distributing different attention weights to vectors of the first user multi-mode preference features corresponding to the similar users through a soft attention mechanism from the user first multi-mode preference feature set, obtaining an aggregate attention coefficient of the preference features of each similar user through a weight normalization technology, and carrying out element level multiplication on a value vector obtained by each similar user and the aggregate attention coefficient to obtain a second user multi-mode preference feature set;

and carrying out element summation on the first user multi-mode preference feature set and the second user multi-mode preference feature set for each user to finish residual linking so as to obtain the user multi-mode preference feature set.

Specifically, in order to model the difference of different modal preferences of the user, in this embodiment, the user multi-modal preference modeling process is divided into two parts of operation of single-user modal preference fusion and similar user multi-modal consistency preference fusion, and the two parts are combined by constructing double-layer attention, so as to complete the multi-modal preference attention fusion of the user. The user multimodal preference attention fusion module can be created as a dual layer attention module as shown in fig. 6.

In terms of the fusion of single-user modality preferences,in this embodiment, feature multiplication is performed on the user single-mode preference features in the user single-mode preference feature set, so as to achieve feature crossover, that is. Through feature crossing, the preference features of different modes of the user can be subjected to nonlinear transformation combination, and the more perfect multi-mode cross preference information of each user is obtained. After preference cross feature extraction, the user single-mode preference features and the cross preference features are input into a first layer of attention layer (namely single-user mode preference fusion attention layer) to carry out personalized single-user multi-mode preference fusion. Because the element-level summation can retain most of information of different modal characteristics, the embodiment performs personalized integration on different modal preferences of a user through element-level attention weighted summation, and the integration formula is as follows:

，

wherein ,user initial multimodal preference representation for current target user via first layer of attention layer,/>For the user +.>Different unimodal preferences represent the corresponding attention coefficients,/->Includedv、tAnd (3) withcI.e., visual modality, text modality, and cross-preference modality; />Is a visual modelCurrent user preference in state indicates +.>For the current user preference representation in text mode, < + >>Is a cross-preference representation of a visual modality with a text modality. And after the operation is performed on each user, the initial multi-mode preference representation of all the users is obtained. In terms of similar user consistency preference fusion, users who frequently interact with the same item have closer multimodal preferences according to the user preference consistency principle. Thus, the multimodal preference personalization fusion pattern of each user is hidden in the co-occurrence relationship of such users. The similar user consistency preference fusion part firstly calculates the users with high co-occurrence times with the target users, and combines a soft attention mechanism with residual connection to carry out personalized fusion on the multi-mode preferences of the similar users. Since the co-occurrence times between the users and the similar users are not consistent, in this embodiment, the users are taken as the current users in turn by sampling the co-occurrence users of the Top-K co-occurrence times, and the current users are the most similar KIndividual users perform personalized multimodal preference aggregation. The aggregation approach is specifically to explore the impact of similar user preferences on target user preferences by assigning different attention weights to multi-modal preference vectors of similar users using a soft attention mechanism. The importance of similar users is weighted according to different similarities, so that the preference expression vector of the target user is more influenced by the preference more similar user vector, and the preference expression vector has larger weight. The specific polymerization formula is as follows:

，

before aggregation by using a soft attention mechanism, carrying out assignment initialization on three vectors of a key, a value and a query vector, wherein the query vector +.>For the initial multi-mode preference vector representation obtained by single-user multi-mode preference fusion of the current target user, a key vector is +.>Sum vector->Is the first of the current useriInitial multimodal preference vector ++obtained by fusing multimodal preferences of single user for each similar user>. Then, use the initialized query vector +.>Preference vectors, key vectors, respectively, of similar users>Performing inner product to obtain weighted weight, and performing weight normalization on the weighted coefficient through softmax operation to obtain aggregate attention coefficient (L) represented by each similar user preference >. Finally the value vector obtained by each similar user is +.>And->Element level multiplication is performed, and the element level multiplication is performed with the multi-modal preference of the current target userAnd carrying out element summation to finish residual linking, and outputting a final multi-mode preference expression vector of the current user for presetting a prediction model.

S140, inputting the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and performing Top-K recommendation on the user through prediction function calculation to obtain a primary model of the MRIH-MPF model.

Specifically, the framework of the primary model of the MRIH-MPF model is shown in FIG. 2, and the model firstly utilizes user project interaction data and project multi-mode characteristics to construct a graph (obtained by step S110) so as to obtain multi-relationship project different-composition graphs and user-project bipartite graphs under different modes. Then, for the constructed multi-graph network, the final user multi-mode preference representation and the project multi-mode representation are obtained for recommendation through multi-relationship project heterogeneous graph feature extraction, user single-mode preference feature extraction (obtained through step S120) and user multi-mode preference attention fusion (obtained through step S130), so that a model framework shown in FIG. 2 is obtained.

As an optional embodiment of the present invention, inputting the item multi-relation feature set and the user multi-mode preference feature set under the multi-mode into a preset prediction model, and performing Top-K recommendation on the user through a prediction function calculation, the obtaining a primary model of the MRIH-MPF model includes:

the item and user are represented as follows according to the item multi-relation feature set and the user multi-mode preference feature set under the multi-mode:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>

and />User +.>Final representation and item->Final representation of->User multimodal preferences embodied for a user multimodal preference feature set +.> and />Outputting project list mode characteristics for the user-project bipartite graph under the corresponding mode respectively;

will userFinal representation and item->Inputting the final representation of the (2) into a preset prediction model, and calculating a predictive value by using a calculation formula of a preset inner product function to obtain a user +.>Item->Is a value of the degree of interest of (a); wherein, the calculation formula is:

； wherein ,/>User output for preset predictive model +.>Item->Is a value of the degree of interest of (a);

by means of the userItem->And (3) carrying out descending order sequencing on the interesting degree values of the MRIH-MPF model, and completing Top-K recommendation on the user to obtain a primary model of the MRIH-MPF model.

Specifically, the project multi-relation feature set obtained through the steps and the user multi-mode preference feature set obtained through the fusion of the user multi-mode preference are obtained, and the final project and the user are expressed as follows:

，

and />User +.>Final representation and item->Final representation of->User multimodal preferences embodied for a user multimodal preference feature set +.> and />Respectively under corresponding modesUser-project bipartite graph output project unimodal features. And finally, inputting the final representations of the user and the item into a preset prediction model to calculate a prediction score. The specific calculation formula is as follows:

，

in the present embodiment, an inner product function is used as the prediction function. Inner product operation is a common method of computing similarity, and can be used to calculate the inner product number of the user multi-modal preference representation and the item multi-modal representation to measure the correlation between the two. Finally, willUser ∈10 as model output>Item->And sorting in descending order according to the value, and recommending the Top-K to the user.

And S150, performing model training on the primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-mode training set to obtain the MRIH-MPF model.

Specifically, the items in the user item interaction data training set and the item multimodal training set are the same kind of items as the items in the user item interaction data and the item multimodal features in step S110. For example, when the MRIH-MPF model is used for movie recommendation, the training data adopted is user movie interaction data.

In the invention, multi-relation project abnormal patterns, user-project bipartite patterns and a double-layer attention mechanism are introduced into multi-modal recommendation of the graphic neural network, inter-project side relation mining and user multi-modal preference representation modeling research are carried out, and an MRIH-MPF model is designed and realized. The MRIH-MPF model can be based on program data of an IPTV platform, fusion of multi-mode information of a third-party network platform, construction of an IPTV multi-mode movie dataset, and then based on the MRIH-MPF model, combining with a Web development technology, and based on a Django framework (a Web application framework of open source codes), IPTV personalized multi-mode movie recommendation is realized.

The model training is the core of the model construction of the invention, firstly, the parameters needed by the model are determined, the parameters of the model are initialized by utilizing a specified mode, the loss function of the model is determined according to the nature of the problem, and a proper optimizer is selected to optimize the model. In the embodiment of the invention, BPRLoss is used as a loss function, and a BPR (Bayesian personalized ranking) algorithm is a common algorithm for solving the implicit feedback problem in collaborative filtering recommendation. Because there is a large amount of implicit feedback data in the recommendation scenario, i.e., the user does not have an explicit score for the item. Thus, a negative feedback sampling problem is generated, that is, the sampled negative sample is not likely to be a disliked item by the user. In order to solve the negative feedback sampling problem, the BPR is different from SVD (singular value decomposition) algorithm to optimize the recommendation model by using point type learning, and a user project triplet is modeled . wherein ,/>Representing the user-> and />Representing items interacted with by the user and not interacted with, respectively. The BPR algorithm is based on user +_ for the item that has been interacted with>Is greater than or equal to the item +.>And (3) carrying out convergence optimization on the recommended result of the model. The Loss function of the present invention is therefore defined as:

，

wherein ,to activate the function +.>Is a regularization parameter. />And->Representing the calculated user +.>And (2) is->And articles->Matching degree of-> and />Representing the relevant parameters of the user embedding and the object embedding in the model respectively.

The specific optimization algorithm of the parameters in the model training process uses an Adam algorithm, and is an effective random optimization method which can be completed only by one step. The algorithm uses first and second moment estimates of the first order gradient to dynamically calculate individual adaptive learning rates that adjust different parameters.

In order to prevent the occurrence of the overfitting phenomenon in the training process, the embodiment of the invention controls the training round number by using an early stop method. First, the total training wheel number is setTo prevent the model training time from being too long, the number of early stop steps is set. Each round of verifying recommended effects of the model on the verification set by using corresponding indexes, saving model parameters with the best effects when the model effects are gradually improved, and continuously ++when indexes on the model on the verification set are continuous- >The wheels descend or reach the total training wheel numberAnd stopping training, and applying the optimal parameters to the test set to obtain a test set recommendation result of the model.

As an optional embodiment of the invention, in the process of performing model training on a primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-modal training set to obtain the MRIH-MPF model:

the MRIH-MPF model is used as an IPTV film multi-mode recommendation model;

the user project interaction data training set is user behavior data, program data and user data acquired through an IPTV system;

the project multi-modal training set is an IPTV film multi-modal data set constructed by available multi-modal resources obtained through crawling on the public network platform.

Specifically, in this embodiment, the IPTV data set mainly includes two parts, that is, an IPTV movie interaction data set and an IPTV movie multi-mode data set. The interactive data set is user behavior data, program data, user data and the like acquired by the IPTV system. In the aspect of the IPTV movie multi-modal dataset, the embodiment is used for constructing the IPTV movie multi-modal dataset by crawling available multi-modal resources on a public network platform. A specific acquisition flow of the user project interaction data training set is shown in fig. 7.

In this embodiment, first, corresponding search keywords are generated for each movie according to IPTV movie text data, which specifically includes: movie title movie director show time ]. And then, inputting the search keywords of each movie into a bean-shaped movie search column through a crawler tool to search, entering the corresponding movie in the search result, and storing the first movie cover of the poster page as the poster picture. For a movie which is not crawled, because a part of the movie showing time and the bean net of the IPTV platform have slight differences, in the embodiment, keywords of the movie which are not crawled are reduced, keywords of the showing year are removed, further crawling is performed, and finally all movie poster information is obtained for subsequent use.

And applying the original multi-modal data to a pre-training model of the corresponding mode to extract pre-training features of the corresponding mode. Aiming at text data, the embodiment performs text merging and word segmentation on movie titles, movie introduction and movie labels, and obtains text mode pre-training characteristics through a Sentence-Bert pre-training model; for image data, visual modality pre-training features are derived by inputting movie posters into a pre-trained Resnet-50 model on an image set. And finally, storing the extracted pre-training characteristics of each mode to obtain final IPTV multi-mode data for model training.

Because the number of users involved in the IPTV source data is large, the data scale is huge, in order to reduce the time consumption of constructing a data set, the embodiment selects historical data of the IPTV platform within a certain time range, then determines the IPTV program range involved in constructing the data set, the data selected by the embodiment is a single-set on-demand program in the IPTV platform, and when the data source is screened, the embodiment uses a Hive data warehouse in a large data technology, and the on-demand data of the designated program type is screened out in a designated time period through Hive SQL. The construction of the IPTV interactive data set mainly relates to two technologies of data cleaning and data negative sampling. The data cleaning is mainly to clean interactive data and program text data. The data negative sampling is the process of carrying out negative sampling on IPTV user interaction data.

As an optional embodiment of the invention, after model training is performed on the primary model of the MRIH-MPF model through the pre-acquired user project interaction data training set and the project multi-mode training set, the method further comprises the steps of:

performing accuracy test on the MRIH-MPF model through a preset test set;

after the MRIH-MPF model passes the accuracy test, the MRIH-MPF model is stored into a callable interface, a recommended system instance is developed by utilizing a Web development technology, and the MRIH-MPF model is used as a main algorithm for realizing recommendation at the back end of a Web site.

And specifically, testing the accuracy of the model through a test set. When the MRIH-MPF model is used for movie recommendation, a recommendation system as shown in FIG. 8 can be constructed in conjunction with Web development technology. In order to ensure the accuracy of recommendation, the system evaluates the effect of the model in a Top-K recommendation scene, selects proper evaluation indexes respectively, and performs comparison evaluation from the following two angles:

(1) Under the public data set, comparing the recommendation model with a baseline model to prove the effectiveness of a recommendation system;

(2) And under the IPTV data set, comparing the recommendation model with a baseline model to prove the practicability of the recommendation system.

In the system, a visual display module can be additionally arranged, the module stores the evaluated model into a callable interface, a recommendation system example is developed by utilizing a Web development technology, and an MRIH-MPF model is used as a main algorithm for realizing recommendation at the back end of a Web site, so that a personalized multi-mode recommendation system based on IPTV movie programs is realized, and the visualization is completed.

S160, generating a target item recommendation result for the target user through an MRIH-MPF model according to the acquired target user item interaction data and the target item multi-modal data.

Specifically, the target user is a locked recommended object, the target item is an item recommended to the target user, for example, a movie is recommended to the user a, and then the user a is the target user, and the movie is the target item.

As shown in fig. 9, the present invention provides a multi-modal item recommendation system 200, which may be installed in an electronic device. Depending on the functionality implemented, the multimodal item recommendation system 200 may include a composition module 210, a feature extraction module 220, a feature fusion module 230, a prediction module 240, a model training module 250, and a recommendation module 260. The inventive unit, which may also be referred to as a module, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the present embodiment, the functions concerning the respective modules/units are as follows:

the composition module 210 is configured to perform graph construction through user project interaction data and project multi-mode features to obtain a multi-relationship project iso-composition and a user-project bipartite graph under different modes;

the feature extraction module 220 is configured to perform graph feature extraction on the multi-relationship project iso-graph and the user-project bipartite graph under different modes to obtain a project multi-mode representation feature set and a user single-mode preference feature set;

The feature fusion module 230 is configured to fuse the user single-mode preference feature set by adopting an attention mechanism technology, so as to obtain a user multi-mode preference feature set;

the prediction module 240 is configured to input the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and perform Top-K recommendation on the user through prediction function calculation to obtain a primary model of the MRIH-MPF model;

the model training module 250 is configured to perform model training on a primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-mode training set, so as to obtain the MRIH-MPF model;

and the recommendation module 260 is used for inputting the acquired target user item interaction data and the target item multi-mode data into the MRIH-MPF model to generate a target item recommendation result for the target user.

According to the Multi-mode project recommendation system 200, personalized project recommendation is realized by constructing an MRIH-MPF model (namely, a recommendation model based on fusion of Multi-relation project iso-composition and Multi-mode preference is called as short term of Multi-relation Item Heterogeneous Graph and Multi-modal Preference Fusion Recommendation), and project Multi-relation heterogeneous characteristics under different modes are obtained from the Multi-relation project iso-composition based on the constructed Multi-relation project iso-composition and a user-project bipartite graph; from the user-project two-part diagram, the user mode preference information of the injected high-order interaction information in the user-project two-part diagram of different modes can be captured through a multi-layer diagram convolution network, attention fusion is carried out on the multi-mode preference of the user, multi-mode preference representation learning of the user is enhanced, and finally accuracy of a recommendation result is improved; according to the recommendation system and the recommendation method, the problems of data sparsity and insufficient utilization of the multi-modal information in the existing recommendation system are solved by utilizing the multi-modal information and the personalized recommendation algorithm technology, so that recommendation results are more accurate and interpretable, and the operation effect and the use experience of users are improved.

As shown in fig. 10, the present invention provides an electronic device 3 of a multi-modal item recommendation method.

The electronic device 3 may comprise a processor 30, a memory 31 and a bus, and may further comprise a computer program stored in the memory 31 and executable on said processor 30, such as a multimodal item recommendation program 32. The memory 31 may also include both internal storage units and external storage devices of the multimodal item recommendation system. The memory 31 may be used not only for storing code installed in application software and various types of data such as a multi-modal item recommendation program, but also for temporarily storing data that has been output or is to be output.

The memory 31 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 31 may in some embodiments be an internal storage unit of the electronic device 3, such as a removable hard disk of the electronic device 3. The memory 31 may in other embodiments also be an external storage device of the electronic device 3, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 may be used not only for storing application software installed in the electronic device 3 and various types of data, such as multi-modal item recommendation method codes, but also for temporarily storing data that has been output or is to be output.

The processor 30 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 30 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device 3 and processes data by running or executing programs or modules (e.g., multi-modal item recommendation programs, etc.) stored in the memory 31, and calling data stored in the memory 31.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 31 and at least one processor 30 or the like.

Fig. 10 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 10 does not constitute a limitation of the electronic device 3, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 3 may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 30 through a power management system, so as to implement functions of charge management, discharge management, and power consumption management through the power management system. The power supply may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like. The electronic device 3 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 3 may also comprise a network interface, optionally comprising a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 3 and other electronic devices.

The electronic device 3 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 3 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration within the scope of the application.

The multimodal item recommendation program 32 stored in the memory 31 in the electronic device 3 is a combination of instructions that, when executed in the processor 30, may implement:

s110, constructing a graph through user project interaction data and project multi-mode features to obtain multi-relationship project different graphs and user-project bipartite graphs under different modes;

s120, extracting graph characteristics of multi-relation project iso-graphs and user-project bipartite graphs under different modes to obtain a project multi-mode representation characteristic set and a user single-mode preference characteristic set;

S130, fusing the user single-mode preference feature sets by adopting an attention mechanism technology to obtain a user multi-mode preference feature set;

s140, inputting the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and performing Top-K recommendation on the user through prediction function calculation to obtain a primary model of the MRIH-MPF model;

s150, performing model training on a primary model of the MRIH-MPF model through a pre-acquired user project interaction data training set and a project multi-mode training set to obtain the MRIH-MPF model;

s160, generating a target item recommendation result for the target user through the MRIH-MPF model according to the acquired target user item interaction data and the target item multi-mode data.

Specifically, the specific implementation method of the above instructions by the processor 30 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the modules/units integrated by the electronic device 3 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. The computer readable medium may include: any entity or system capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

Embodiments of the present invention also provide a computer readable storage medium, which may be non-volatile or volatile, storing a computer program which when executed by a processor implements:

In particular, the specific implementation method of the computer program when executed by the processor may refer to descriptions of related steps in the multi-modal item recommendation method of the embodiment, which are not described herein in detail.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, system and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and other manners of division may be implemented in practice.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. Multiple units or systems as set forth in the system claims may also be implemented by means of one unit or system in software or hardware. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A multi-mode project recommending method is characterized by comprising the following steps:

constructing a graph through user project interaction data and project multi-mode features to obtain multi-relationship project different graphs and user-project bipartite graphs under different modes; the method comprises the following steps of: extracting interaction information of the user project interaction data, extracting multi-modal information of the project multi-modal characteristics, and calculating the side relationship of the project and sampling neighbor nodes through the extracted interaction information and the multi-modal information so as to obtain a project graph under each relationship; fusing the modal similar semantic item graphs and the co-occurrence collaborative item graphs under different modalities aiming at the item graphs under each relation to obtain multi-relation item iso-graphs under different modalities; embedding interest preference of user nodes as a construction target according to the user project interaction data, and constructing a corresponding user-project bipartite graph carrying high-order communication information for each mode;

drawing characteristics of the multi-relation project iso-graph and the user-project bipartite graph under different modes are extracted, and a project multi-mode representation characteristic set and a user single-mode preference characteristic set are obtained; the method comprises the following steps of: respectively and sequentially carrying out intra-relationship aggregation and inter-relationship aggregation on the multi-relationship project heterogeneous graphs under each mode in the multi-relationship project heterogeneous graphs under different modes to finish information propagation aggregation of the heterogeneous graphs, and acquiring project heterogeneous graph characteristics corresponding to different modes to obtain a project multi-relationship characteristic set under the multi-modes; inputting the project relation features in the project multi-relation feature set under the multi-mode into a user-project bipartite graph under the corresponding mode to be used as project node initial representation in the bipartite graph, initializing user ID information by adopting normal distribution to obtain user node initial representation in the bipartite graph, and extracting user single-mode preference through graph convolution operation to obtain a user single-mode preference feature set and a project single-mode representation feature set with high-order interaction information; element-level summation is carried out on the item single-mode representation feature set to obtain an item multi-mode representation feature set; wherein the user ID information is from the user item interaction data;

2. The method of claim 1, wherein the fusing the user single-mode preference feature set to obtain the user multi-mode preference feature set by using an attention mechanism technology comprises:

Performing feature cross processing on the user single-mode preference features in the user single-mode preference feature set, inputting the multi-mode cross preference features of each user obtained by feature cross and the user single-mode preference features into the single-user mode preference fusion attention layer to perform personalized single-user multi-mode preference fusion, and obtaining a first user multi-mode preference feature set;

taking each user in the user single-mode preference feature set as a current user respectively, acquiring K similar users most similar to the current user through a co-occurrence method of sampling Top-K co-occurrence times, distributing different attention weights to vectors of first user multi-mode preference features corresponding to the similar users through a soft attention mechanism from the user first multi-mode preference feature set, acquiring an aggregate attention coefficient of the preference features of each similar user through a weight normalization technology, and multiplying a value vector acquired through each similar user by an element level of the aggregate attention coefficient to acquire a second user multi-mode preference feature set;

and for each user, carrying out element summation on the first user multi-mode preference feature set and the second user multi-mode preference feature set to finish residual linking so as to obtain a user multi-mode preference feature set.

3. The multi-modal item recommendation method according to claim 1, wherein the inputting the item multi-modal representation feature set and the user multi-modal preference feature set into a preset prediction model, and performing Top-K recommendation on the user through a prediction function calculation, and obtaining a primary model of an MRIH-MPF model comprises:

and obtaining the representation of the item and the user according to the item multi-relation feature set in the multi-mode and the user multi-mode preference feature set as follows:

； wherein ,

the user is provided withFinal representation of (c) and said item->Inputting the final representation of the (2) into a preset prediction model, and calculating a predictive value by using a calculation formula of a preset inner product function to obtain a user +.>Item->Is a value of the degree of interest of (a); wherein, the calculation formula is:

； wherein ,

user +.>Item->Is a value of the degree of interest of (a);

by the userItem- >And (3) carrying out descending order sequencing on the interesting degree values of the MRIH-MPF model, and completing Top-K recommendation on the user to obtain a primary model of the MRIH-MPF model.

4. The multi-modal item recommendation method according to claim 1, wherein in the process of model training the primary model of the MRIH-MPF model through the pre-acquired user item interaction data training set and item multi-modal training set to obtain the MRIH-MPF model:

the MRIH-MPF model is used as an IPTV film multi-mode recommendation model;

the project multi-modal training set is an IPTV film multi-modal data set constructed by available multi-modal resources obtained through crawling on a public network platform.

5. The multi-modal item recommendation method of claim 4, further comprising, after the model training of the primary model of the MRIH-MPF model by the pre-acquired user item interaction data training set and item multi-modal training set, obtaining an MRIH-MPF model:

performing accuracy test on the MRIH-MPF model through a preset test set;

6. A multimodal item recommendation system, the system comprising:

the composition module is used for constructing a graph through user project interaction data and project multi-mode characteristics to obtain multi-relationship project different compositions and user-project bipartite graphs under different modes; wherein the patterning module comprises:

the information extraction unit is used for extracting interaction information of the user project interaction data and multi-modal information of the project multi-modal characteristics, and calculating the side relationship of the project and sampling neighbor nodes through the extracted interaction information and multi-modal information, so that a project graph under each relationship is obtained;

the fusion unit is used for fusing the modal similar semantic item graphs and the co-occurrence collaborative item graphs under different modes aiming at the item graphs under each relation to acquire multi-relation item different graphs under different modes;

the composition unit is used for constructing a corresponding modal-level user-project bipartite graph carrying high-order communication information for each modal by taking interest preference embedding of user nodes as a construction target according to the user project interaction data;

The feature extraction module is used for extracting the graph features of the multi-relation project heterograms and the user-project bipartite graphs under different modes to obtain a project multi-mode representation feature set and a user single-mode preference feature set; wherein, the feature extraction module includes:

the aggregation unit is used for respectively and sequentially carrying out intra-relation aggregation and inter-relation aggregation on the multi-relation project heterogeneous graphs under each mode in the multi-relation project heterogeneous graphs under different modes so as to finish information propagation aggregation of the heterogeneous graphs, obtain project heterogeneous graph characteristics corresponding to different modes and obtain a project multi-relation characteristic set under the multi-modes;

the preference extraction unit is used for inputting the project relation characteristics in the project multi-relation characteristic set under the multi-mode into a user-project bipartite graph under the corresponding mode to be used as project node initial representation in the bipartite graph, initializing user ID information by adopting normal distribution to obtain user node initial representation in the bipartite graph, and extracting user single-mode preference through graph convolution operation to obtain a user single-mode preference characteristic set and a project single-mode representation characteristic set with high-order interaction information;

the summing unit is used for carrying out element-level summation on the item single-mode representation feature set to obtain an item multi-mode representation feature set; wherein the user ID information is derived from the user item interaction data

7. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps in the multimodal item recommendation method of any of claims 1 to 5.

8. A computer readable storage medium storing at least one instruction, wherein the at least one instruction when executed by a processor in an electronic device implements the multimodal item recommendation method of any of claims 1 to 5.