CN115952307A - Recommendation method based on multimodal graph contrast learning, electronic device and storage medium - Google Patents

Recommendation method based on multimodal graph contrast learning, electronic device and storage medium Download PDF

Info

Publication number
CN115952307A
CN115952307A CN202211742093.6A CN202211742093A CN115952307A CN 115952307 A CN115952307 A CN 115952307A CN 202211742093 A CN202211742093 A CN 202211742093A CN 115952307 A CN115952307 A CN 115952307A
Authority
CN
China
Prior art keywords
user
modal
item
layer
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211742093.6A
Other languages
Chinese (zh)
Inventor
薛峰
桑胜
张研
徐江凤
叶向晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Aisino Corp
Hefei University of Technology
Original Assignee
Anhui Aisino Corp
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Aisino Corp, Hefei University of Technology filed Critical Anhui Aisino Corp
Priority to CN202211742093.6A priority Critical patent/CN115952307A/en
Publication of CN115952307A publication Critical patent/CN115952307A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Image Analysis (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a recommendation method based on multimodal graph contrast learning, which comprises the following steps: 1. data acquisition and pretreatment; 2. a graph volume layer; 3. constructing a comparison learning layer; 4. constructing a loss function; 5. and training the graph comparison learning model. When the method is used for processing the recommendation task of the multi-modal data, the representation of the user and the article can be enhanced through the separated graph learning mode and the contrast learning, and the problem of multi-modal noise pollution is relieved.

Description

Recommendation method based on multimodal graph contrast learning, electronic device and storage medium
Technical Field
The invention relates to a multi-modal graph contrast learning-based multimedia recommendation method, electronic equipment and a storage medium, and belongs to the field of recommendation systems.
Background
Multimedia-based recommendations are a challenging task that requires not only learning collaboration signals from user-item interactions, but also capturing modality-specific user cues of interest from complex multimedia content. Despite significant advances in current solutions for multimedia-based recommendation algorithms, they are still limited by multi-modal noise pollution. In particular, a substantial portion of the multimedia content of the item is independent of user preferences such as background, overall layout, image brightness, word order in the title, and semantic-free words. In addition, most recent studies are performed by image learning. This means that as the message propagates into the user and item representations, the polluting effects will be further amplified.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a multi-modal graph contrast learning-based multimedia recommendation method, electronic equipment and a storage medium, so that the problem of multi-modal noise pollution is relieved when a recommendation task of multi-modal data is processed, and the representation of a user and an article is enhanced through a separated graph learning mode and contrast learning, so that the recommendation accuracy and precision can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses a multimedia recommendation method based on multimodal graph contrast learning, which is characterized by comprising the following steps of:
step 1, data acquisition and pretreatment;
step 1.1, building a project set of commodities, and marking as I = { I = { (I) } 1 ,i 2 ,…,in,…i |I| In, where in represents the nth item, | I | represents the total number of items;
constructing a user set, and recording as U = { U = 1 ,u 2 ,…,um,…,u |U| H, wherein um represents the mth user; | U | represents the total number of users;
constructing user item bipartite graphs
Figure BDA0004030814520000011
Wherein it is present>
Figure BDA0004030814520000012
Represents the mth user u m And the nth item in, if so, then let @>
Figure BDA0004030814520000013
Otherwise, make->
Figure BDA0004030814520000014
Respectively mapping the m < th > user um and the n < th > item in into user embedding
Figure BDA0004030814520000015
And item embedding->
Figure BDA0004030814520000016
The m-th user um is respectively corresponding to embedding vectors in an image modality V and a text modality T and is->
Figure BDA0004030814520000017
And &>
Figure BDA0004030814520000018
Step 1.2, depth feature extraction:
the image v corresponding to the nth commodity item in n Inputting the data into a pre-trained VGG16 model for processing to obtain image characteristics
Figure BDA0004030814520000019
The image feature matrix ^ of the image modality V is thus constructed using equation (1)>
Figure BDA00040308145200000110
dV is the dimension of the image feature:
Figure BDA0004030814520000021
corresponding text t of the nth commodity item in n Inputting the text into a pre-trained Sennce 2Vec model for processing to obtain text characteristics
Figure BDA0004030814520000022
The text feature matrix ≥ of the text modality T is thus constructed using equation (2)>
Figure BDA0004030814520000023
dT is the dimension of the text feature:
Figure BDA0004030814520000024
step 2, constructing a multimodal map contrast learning model, comprising the following steps: a graph volume layer, a comparison learning layer and a prediction layer;
step 2.1, processing the graph volume layer:
step 2.2.1, respectively obtaining the embedding of the mth user um and the nth item in the ith layer graph convolution layer by using the formulas (3) and (4):
Figure BDA0004030814520000025
Figure BDA0004030814520000026
in the formulae (3) and (4),
Figure BDA0004030814520000027
and &>
Figure BDA0004030814520000028
Respectively representing the neighbor sets of the mth user um and the nth item in,
Figure BDA0004030814520000029
and &>
Figure BDA00040308145200000210
Respectively representing the neighbor number of the mth user um and the neighbor number of the nth item in; />
Figure BDA00040308145200000211
Is the embedding of the nth item in the l-1 th map convolutional layer, and makes ^ er/be when l =1>
Figure BDA00040308145200000212
Is the embedding of the mth user um in the l-1 th map convolutional layer, and makes ÷ greater than or equal to 1>
Figure BDA00040308145200000213
Step 2.2.2, respectively inIn the image mode V or the text mode T, the mth user u is obtained through the formulas (5) and (6) respectively m And embedding of the nth item in into the first layer graph convolution layer in the multi-modal model
Figure BDA00040308145200000214
Figure BDA00040308145200000215
Figure BDA00040308145200000216
In the formulas (5) and (6), alpha is a hyper-parameter, and TR represents transposition; modal represents a multi-modal, and modal = V or T,
Figure BDA00040308145200000217
Figure BDA00040308145200000218
is a weight transformation matrix for a multimodal modal, d moda l is the dimension of the multimodal modal feature, d is the embedding size; />
Figure BDA00040308145200000219
Characteristic of the multimodal modal, representing the nth item, is @>
Figure BDA00040308145200000220
Represents the embedding of the mth user um in the l-1 th level map convolutional layer under the multimodal modal, and->
Figure BDA00040308145200000221
Figure BDA00040308145200000222
Represents the mth user u m An embedded vector at level l-1 in the image modality V>
Figure BDA00040308145200000223
Represents the mth user u m Embedding vectors in the l-1 level of the text modality T let ^ er when l =1>
Figure BDA00040308145200000224
Figure BDA00040308145200000225
Indicating that the nth item in is embedded into the graph convolution layer of the l-1 layer under the multi-mode modal; and->
Figure BDA00040308145200000226
Figure BDA00040308145200000227
An image feature representing the nth item in->
Figure BDA0004030814520000031
A text feature representing the nth item in, when l =1, causes ≧>
Figure BDA0004030814520000032
Step 2.2.3, obtaining the mth user u by using the formula (7) and the formula (8) m And embedding of the nth item in into the l +1 th graph convolution layer under the multi-modal model
Figure BDA0004030814520000033
/>
Figure BDA0004030814520000034
Figure BDA0004030814520000035
Step 2.2.4, processing is carried out according to the process from step 2.2.2 to step 2.2.3, so that the characteristics of the mth user um are output by the L-th layer
Figure BDA0004030814520000036
Characteristic ^ s of mth user um under multimodal modal>
Figure BDA0004030814520000037
Characteristic ^ of the nth item in>
Figure BDA0004030814520000038
Step 2.3, processing of the comparison learning layer:
step 2.3.1, constructing a user contrast loss function through the formula (9)
Figure BDA0004030814520000039
Figure BDA00040308145200000310
In the formula (9), the reaction mixture is,
Figure BDA00040308145200000311
representing the characteristics of the jth user uj in a multi-modal model at the L level, wherein tau is a hyper-parameter; step 2.3.2 construction of the term comparison loss function ^ by equation (10)>
Figure BDA00040308145200000312
Figure BDA00040308145200000313
In the formula (10), the compound represented by the formula (10),
Figure BDA00040308145200000314
representing the characteristics of the kth user ik in a multimodal modal;
step 2.3.3, constructing a contrast loss function by the formula (11)
Figure BDA00040308145200000315
Figure BDA00040308145200000316
Step 2.4, processing the prediction layer:
calculating a preference score between the mth user um and the nth item in using equation (12)
Figure BDA00040308145200000317
Figure BDA00040308145200000318
In the formula (12), λ is a hyperparameter;
and 3, constructing a loss function of the multimodal map contrast learning model:
step 3.1, constructing a first loss function by using the formula (13)
Figure BDA00040308145200000319
Figure BDA00040308145200000320
Step 3.2, constructing a second loss function by using the formula (14)
Figure BDA00040308145200000321
Figure BDA00040308145200000322
Step 3.3, constructing a total loss function by using the formula (15)
Figure BDA0004030814520000041
Figure BDA0004030814520000042
In the formulae (13) to (15),
Figure BDA0004030814520000043
is training data, ix denotes the xth item, <' > is>
Figure BDA0004030814520000044
Representing the neighbor set of the mth user um, wherein sigma is a sigmoid function;
step 4, training the multi-modal graph contrast learning model by utilizing a gradient descent method based on the training data O, and calculating a total loss function
Figure BDA0004030814520000045
When the training iteration times reach the set times or the loss error is smaller than the set threshold value, the training is stopped, so that an optimal multi-modal graph contrast learning model is obtained and is used for judging the image characteristic matrix of the image modality>
Figure BDA0004030814520000046
Text feature matrix of text modality>
Figure BDA0004030814520000047
User embedding &>
Figure BDA0004030814520000048
Item embedding +>
Figure BDA0004030814520000049
Dense vector representation->
Figure BDA00040308145200000410
And &>
Figure BDA00040308145200000411
And processing and outputting the score of each user for each item, thereby selecting top items and recommending each user.
The invention relates to an electronic device, comprising a memory and a processor, characterized in that the memory is used for storing programs for supporting the processor to execute the multimedia recommendation method, and the processor is configured to execute the programs stored in the memory.
The present invention is a computer-readable storage medium having a computer program stored thereon, characterized in that the computer program, when being executed by a processor, performs the steps of the multimedia recommendation method.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention constructs an embedded graph volume network module which is specially used for spreading users and items, and is specially used for spreading the embedding of the users and the items, thereby lightening the problem of multi-mode noise pollution.
2. The present invention enhances the representation of users and items through separate graph learning modes and contrast learning to better capture the collaborative signal and multimodal preferences and attenuate the effects of multimodal noise.
Drawings
FIG. 1 is a schematic diagram of the recommendation method based on multi-modal graph contrast learning according to the present invention.
Detailed Description
In this embodiment, a recommendation method based on multimodal graph contrast learning is to construct a graph convolution module to capture a collaborative signal and multimodal user preferences, then eliminate noise pollution in modeling of the multimodal user preferences by adopting contrast learning, and finally, in order to ensure sufficient learning of a model, in this embodiment, an alternating training strategy is used to optimize the collaborative signal and the multimodal user preferences, specifically, as shown in fig. 1, the following steps are performed:
step 1, data acquisition and pretreatment;
step 1.1, building a project set of commodities, and marking as I = { I = { (I) } 1 ,i 2 ,…,in,…i |I| In, where in represents the nth item, | I | represents the total number of items;
constructing a user set, and recording the user set as U = { U = { (U) } 1 ,u 2 ,…,um,…,u |U| H, wherein um represents the mth user; | U | represents the total number of users;
constructing user item interaction graph using implicit feedback data in dataset
Figure BDA0004030814520000051
Wherein +>
Figure BDA0004030814520000052
Indicates whether an interaction exists between the mth user um and the nth item in, and if so, causes &>
Figure BDA0004030814520000053
Otherwise, make +>
Figure BDA0004030814520000054
Respectively mapping the m < th > user um and the n < th > item in into user embedding
Figure BDA0004030814520000055
And item embedding->
Figure BDA0004030814520000056
The m-th user um is respectively corresponding to embedding vectors in an image modality V and a text modality T and is->
Figure BDA0004030814520000057
And &>
Figure BDA0004030814520000058
Step 1.2, depth feature extraction:
the image v corresponding to the nth commodity item in n Inputting the data into a pre-trained VGG16 model for processing to obtain image characteristics
Figure BDA0004030814520000059
In order to construct an image feature matrix { (1) } of the image modality V>
Figure BDA00040308145200000510
dV is the dimension of an image feature:
Figure BDA00040308145200000511
Corresponding text t of the nth commodity item in n Inputting the text into a pre-trained Sennce 2Vec model for processing to obtain text characteristics
Figure BDA00040308145200000512
In order to construct a text feature matrix { (2) } of the text modality T>
Figure BDA00040308145200000513
dT is the dimension of the text feature:
Figure BDA00040308145200000514
step 2, constructing a multimodal map contrast learning model, comprising the following steps: a graph volume layer, a comparison learning layer and a prediction layer;
step 2.1, processing the graph volume layer:
step 2.2.1, in order to model clean high-order collaborative signals, the invention does not incorporate multimodal features into the user item interaction graph
Figure BDA00040308145200000515
And executing the graph convolution operation. The embedding of the mth user um and the nth item in the l layer graph convolution layer is obtained by using the formula (3) and the formula (4), respectively:
Figure BDA00040308145200000516
Figure BDA00040308145200000517
in the formulae (3) and (4),
Figure BDA00040308145200000518
and &>
Figure BDA00040308145200000526
Respectively representing the neighbor sets of the mth user um and the nth item in,
Figure BDA00040308145200000519
and &>
Figure BDA00040308145200000520
Respectively representing the neighbor number of the mth user um and the neighbor number of the nth item in; />
Figure BDA00040308145200000521
Is the embedding of the nth item in the convolution layer of the l-1 layer graph, and makes ^ greater or less than 1 when l =1>
Figure BDA00040308145200000522
Figure BDA00040308145200000523
Is the embedding of the mth user um in the graph convolution layer of the l-1 layer, and makes ^ greater or less than 1 when l =1>
Figure BDA00040308145200000524
Step 2.2.2, respectively obtaining the mth user u through the formula (5) and the formula (6) under the image mode V or the text mode T m And embedding of the nth item in into the first layer graph convolution layer in the multi-modal model
Figure BDA00040308145200000525
To incorporate multimodal information of historical interactions into the node representation:
Figure BDA0004030814520000061
Figure BDA0004030814520000062
in the formulas (5) and (6), alpha is a hyper-parameter, and TR represents transposition; modal represents a multi-modal, and modal = V or T,
Figure BDA0004030814520000063
Figure BDA0004030814520000064
is a weight transformation matrix for a multimodal modal, d moda l is the dimension of the multimodal modal feature, d is the embedding size; />
Figure BDA0004030814520000065
Characteristic of the multimodal modal, representing the nth item, is @>
Figure BDA0004030814520000066
Represents the embedding of the mth user um in the l-1 th level map convolutional layer under the multimodal modal, and->
Figure BDA0004030814520000067
Figure BDA0004030814520000068
Represents the mth user u m An embedded vector at level l-1 in the image modality V>
Figure BDA0004030814520000069
Represents the mth user u m Embedding vectors in the l-1 level of the text modality T let ^ er when l =1>
Figure BDA00040308145200000610
Figure BDA00040308145200000611
Represents the embedding of the nth item in into the graph volume layer of the layer l-1 under the multi-mode modal; and->
Figure BDA00040308145200000612
Figure BDA00040308145200000613
An image feature representing the nth item in->
Figure BDA00040308145200000614
A text feature representing the nth item in, when l =1, causes ≧>
Figure BDA00040308145200000615
Step 2.2.3, obtaining the mth user u by using the formula (7) and the formula (8) m And embedding of the nth item in into the l +1 th graph convolution layer under the multi-modal model
Figure BDA00040308145200000616
Figure BDA00040308145200000617
Figure BDA00040308145200000618
Step 2.2.4, processing according to the process from step 2.2.2 to step 2.2.3, thereby outputting the characteristics of the mth user um by the L-th layer
Figure BDA00040308145200000619
Characteristic @ of mth user um under multimodal modal>
Figure BDA00040308145200000620
Characteristic ^ of the nth item in>
Figure BDA00040308145200000621
Step 2.3, processing of the comparison learning layer:
step 2.3.1, constructing a user contrast loss function through the formula (9)
Figure BDA00040308145200000622
Figure BDA00040308145200000623
In the formula (9), the reaction mixture is,
Figure BDA00040308145200000624
representing the characteristics of the jth user uj in a multi-modal model at the L level, wherein tau is a hyper-parameter; step 2.3.2 construction of the term comparison loss function ^ by the equation (10)>
Figure BDA00040308145200000625
Figure BDA00040308145200000626
In the formula (10), the reaction mixture is,
Figure BDA00040308145200000627
representing the characteristics of the kth user ik in a multimodal modal;
step 2.3.3, in order to combine the node characteristics of users and projects in visual and text modes, constructing a contrast loss function by the formula (11)
Figure BDA00040308145200000628
Figure BDA0004030814520000071
Step 2.4, processing the prediction layer:
calculating a preference score between the mth user um and the nth item in using equation (12)
Figure BDA0004030814520000072
Figure BDA0004030814520000073
In the formula (12), λ is a hyperparameter;
and 3, in order to optimize the multi-modal graph contrast learning model, updating the representations of the user and the items by adopting a widely used Bayesian personalized ranking loss as a basic optimization target, wherein the Bayesian personalized ranking loss assumes that the user prefers the history interactive items rather than the untouched items. Constructing a loss function of the multi-modal graph learning model:
step 3.1, constructing a first loss function by using the formula (13)
Figure BDA0004030814520000074
Figure BDA0004030814520000075
Step 3.2, constructing a second loss function by using the formula (14)
Figure BDA0004030814520000076
Figure BDA0004030814520000077
Step 3.3, constructing a total loss function by using the formula (15)
Figure BDA0004030814520000078
Figure BDA0004030814520000079
In the formulae (13) to (15),
Figure BDA00040308145200000710
is training data, ix denotes the xth item, <' > is>
Figure BDA00040308145200000711
Representing a neighbor set of the mth user um, wherein sigma is a sigmoid function;
step 4, training the multi-modal graph contrast learning model by utilizing a gradient descent method based on the training data O, and calculating a total loss function
Figure BDA00040308145200000718
When the training iteration times reach the set times or the loss error is smaller than the set threshold value, the training is stopped, so that an optimal multi-modal graph contrast learning model is obtained and is used for judging the image characteristic matrix of the image modality>
Figure BDA00040308145200000712
Text feature matrix of text modality>
Figure BDA00040308145200000713
User embedding->
Figure BDA00040308145200000714
Item embedding->
Figure BDA00040308145200000715
Dense vector representation +>
Figure BDA00040308145200000716
And &>
Figure BDA00040308145200000717
And processing and outputting the score of each user for each item, thereby selecting top items and recommending each user.
In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the multimedia recommendation method, and a processor configured to execute the program stored in the memory.
In this embodiment, a computer-readable storage medium stores a computer program, and the computer program is executed by a processor to perform the steps of the multimedia recommendation method.

Claims (3)

1. A multi-modal graph contrast learning-based multimedia recommendation method is characterized by comprising the following steps of:
step 1, data acquisition and pretreatment;
step 1.1, building a project set of commodities, and marking as I = { I = { (I) } 1 ,i 2 ,…,i n ,…i |I| In which i n Represents the nth item, | I | represents the total number of items;
constructing a user set, and recording the user set as U = { U = { (U) } 1 ,u 2 ,…,u m ,…,u |U| In which u m Represents the mth user; | U | represents the total number of users;
constructing a user item bipartite graph
Figure FDA0004030814510000011
Wherein it is present>
Figure FDA0004030814510000012
Represents the mth user u m And the nth item i n Whether there is an interaction between them or not, if yes, make->
Figure FDA0004030814510000013
Otherwise, make +>
Figure FDA0004030814510000014
The mth user u m And the nth item i n Mapping separately to user embedding
Figure FDA0004030814510000015
And item embedding->
Figure FDA0004030814510000016
Mth user u m In image modalities V and text, respectivelyThe corresponding embedded vector in the present mode T is->
Figure FDA0004030814510000017
And &>
Figure FDA0004030814510000018
Step 1.2, depth feature extraction:
the nth commodity item i n Corresponding image v n Inputting the data into a pre-trained VGG16 model for processing to obtain image characteristics
Figure FDA0004030814510000019
The image feature matrix ^ of the image modality V is thus constructed using equation (1)>
Figure FDA00040308145100000110
d V Is the dimension of the image feature:
Figure FDA00040308145100000111
the nth commodity item i n Corresponding text t n Inputting the text into a pre-trained Sennce 2Vec model for processing to obtain text characteristics
Figure FDA00040308145100000112
The text feature matrix ≥ of the text modality T is thus constructed using equation (2)>
Figure FDA00040308145100000113
d T Is the dimension of the text feature:
Figure FDA00040308145100000114
step 2, constructing a multimodal graph comparison learning model, which comprises the following steps: a graph volume layer, a comparison learning layer and a prediction layer;
step 2.1, processing the graph convolution layer:
step 2.2.1, respectively obtaining the mth user u by using the formula (3) and the formula (4) m And the nth item i n Embedding of the convolutional layer in the l-th layer:
Figure FDA00040308145100000115
Figure FDA00040308145100000116
in the formulae (3) and (4),
Figure FDA00040308145100000117
and &>
Figure FDA00040308145100000118
Respectively represent the m-th user u m And the nth item i n Is selected, based on the number of neighbor sets in the neighbor set, is greater than>
Figure FDA00040308145100000119
And
Figure FDA00040308145100000120
respectively represent the m-th user u m And the nth item i n The number of neighbors of (a); />
Figure FDA00040308145100000121
Is the nth item i n Embedding in the convolution layer of the l-1 st layer, when l =1, make ^ greater or lesser than>
Figure FDA00040308145100000122
Figure FDA00040308145100000123
Is the mth user u m Embedding in the convolution layer of the l-1 st layer, when l =1, make ^ greater or lesser than>
Figure FDA00040308145100000124
Step 2.2.2, respectively obtaining the mth user u through the formula (5) and the formula (6) under the image mode V or the text mode T m And the nth item i n Embedding of layer I graph convolution layer in multimodal modal
Figure FDA0004030814510000021
/>
Figure FDA0004030814510000022
Figure FDA0004030814510000023
In the formulas (5) and (6), alpha is a hyper-parameter, and TR represents transposition; modal represents a multi-modal, and modal = V or T,
Figure FDA0004030814510000024
Figure FDA0004030814510000025
is a weight transformation matrix for a multimodal modal, d modal Is the dimension of the multi-modal feature, d is the embedding size; />
Figure FDA0004030814510000026
Characteristic of the multimodal modal, representing the nth item, is @>
Figure FDA0004030814510000027
Represents the mth user u m Embedding of layer l-1 map convolutional layer under multimodal modal, and ^ h>
Figure FDA0004030814510000028
Figure FDA0004030814510000029
Represents the mth user u m An embedded vector at level l-1 in the image modality V>
Figure FDA00040308145100000210
Represents the mth user u m Embedding vectors in the l-1 level of the text modality T let ^ er when l =1>
Figure FDA00040308145100000211
Figure FDA00040308145100000212
Represents the nth item i n Embedding a layer l-1 graph convolution layer under a multi-modal model; and->
Figure FDA00040308145100000213
Figure FDA00040308145100000214
Represents the nth item i n The characteristics of the image of (a) are,
Figure FDA00040308145100000215
represents the nth item i n When l =1, let £ be £ £ 5>
Figure FDA00040308145100000216
Step 2.2.3, obtaining the mth user u by using the formula (7) and the formula (8) m And the nth item i n Embedding of layer 1 + graph convolution layer in multimodal modal
Figure FDA00040308145100000217
Figure FDA00040308145100000218
Figure FDA00040308145100000219
Step 2.2.4, processing according to the process from step 2.2.2 to step 2.2.3, thereby outputting the mth user u by the L layer m Is characterized by
Figure FDA00040308145100000220
Mth user u m Feature @ under a multimodal modal>
Figure FDA00040308145100000221
The nth item i n Is characterized by>
Figure FDA00040308145100000222
Step 2.3, processing of the comparison learning layer:
step 2.3.1, constructing a user contrast loss function through the formula (9)
Figure FDA00040308145100000223
Figure FDA00040308145100000224
In the formula (9), the reaction mixture is,
Figure FDA00040308145100000225
represents the jth user u j Features under multimodal modal at level L, τ is a hyper-parameter;
step 2.3.2, construction of item contrast loss function by the formula (10)
Figure FDA00040308145100000226
Figure FDA00040308145100000227
In the formula (10), the reaction mixture is,
Figure FDA0004030814510000031
represents the k-th user i k Features under a multimodal modal;
step 2.3.3, constructing a contrast loss function by the formula (11)
Figure FDA0004030814510000032
/>
Figure FDA0004030814510000033
Step 2.4, processing the prediction layer:
the mth user u is calculated using equation (12) m And the nth item u n Preference score between
Figure FDA0004030814510000034
Figure FDA0004030814510000035
In the formula (12), λ is a hyperparameter;
and 3, constructing a loss function of the multimodal map contrast learning model:
step 3.1, constructing a first loss function by using the formula (13)
Figure FDA0004030814510000036
Figure FDA0004030814510000037
Step 3.2, constructing a second loss function by using the formula (14)
Figure FDA0004030814510000038
Figure FDA0004030814510000039
And 3.3, constructing a total loss function L by using the formula (15):
Figure FDA00040308145100000310
in the formulae (13) to (15),
Figure FDA00040308145100000311
is training data, i represents the xth item,
Figure FDA00040308145100000312
represents the mth user u m σ is a sigmoid function;
step 4, training the multi-modal graph contrast learning model by utilizing a gradient descent method based on the training data O, and calculating a total loss function
Figure FDA00040308145100000319
When the training iteration times reach the set times or the loss error is smaller than the set threshold value, the training is stopped, so that an optimal multi-modal graph contrast learning model is obtained and is used for judging the image characteristic matrix of the image modality>
Figure FDA00040308145100000313
Text feature matrix of text modality>
Figure FDA00040308145100000314
User embedding &>
Figure FDA00040308145100000315
Item embedding +>
Figure FDA00040308145100000316
Dense vector representation +>
Figure FDA00040308145100000317
And &>
Figure FDA00040308145100000318
And processing and outputting the score of each user for each item, thereby selecting top items and recommending each user.
2. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that enables the processor to perform the multimedia recommendation method of claim 1, and the processor is configured to execute the program stored in the memory.
3. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the multimedia recommendation method of claim 1.
CN202211742093.6A 2022-12-30 2022-12-30 Recommendation method based on multimodal graph contrast learning, electronic device and storage medium Pending CN115952307A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211742093.6A CN115952307A (en) 2022-12-30 2022-12-30 Recommendation method based on multimodal graph contrast learning, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211742093.6A CN115952307A (en) 2022-12-30 2022-12-30 Recommendation method based on multimodal graph contrast learning, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115952307A true CN115952307A (en) 2023-04-11

Family

ID=87285822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211742093.6A Pending CN115952307A (en) 2022-12-30 2022-12-30 Recommendation method based on multimodal graph contrast learning, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115952307A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932887A (en) * 2023-06-07 2023-10-24 哈尔滨工业大学(威海) Image recommendation system and method based on multi-modal image convolution
CN117786234A (en) * 2024-02-28 2024-03-29 云南师范大学 Multimode resource recommendation method based on two-stage comparison learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116932887A (en) * 2023-06-07 2023-10-24 哈尔滨工业大学(威海) Image recommendation system and method based on multi-modal image convolution
CN117786234A (en) * 2024-02-28 2024-03-29 云南师范大学 Multimode resource recommendation method based on two-stage comparison learning
CN117786234B (en) * 2024-02-28 2024-04-26 云南师范大学 Multimode resource recommendation method based on two-stage comparison learning

Similar Documents

Publication Publication Date Title
US11314806B2 (en) Method for making music recommendations and related computing device, and medium thereof
US11593612B2 (en) Intelligent image captioning
CN107836000B (en) Improved artificial neural network method and electronic device for language modeling and prediction
CN108509573B (en) Book recommendation method and system based on matrix decomposition collaborative filtering algorithm
US10489688B2 (en) Personalized digital image aesthetics in a digital medium environment
CN106776673B (en) Multimedia document summarization
CN107273438B (en) Recommendation method, device, equipment and storage medium
CN111339415B (en) Click rate prediction method and device based on multi-interactive attention network
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN110362723B (en) Topic feature representation method, device and storage medium
CN115952307A (en) Recommendation method based on multimodal graph contrast learning, electronic device and storage medium
CN110046221A (en) A kind of machine dialogue method, device, computer equipment and storage medium
KR20160144384A (en) Context-sensitive search using a deep learning model
US20230316379A1 (en) Deep learning based visual compatibility prediction for bundle recommendations
CN106708929B (en) Video program searching method and device
CN111309878B (en) Search type question-answering method, model training method, server and storage medium
CN114358203A (en) Training method and device for image description sentence generation module and electronic equipment
KR20190075277A (en) Method for searching content and electronic device thereof
CN111985548A (en) Label-guided cross-modal deep hashing method
CN115455228A (en) Multi-mode data mutual detection method, device, equipment and readable storage medium
CN106570196B (en) Video program searching method and device
CN114298783A (en) Commodity recommendation method and system based on matrix decomposition and fusion of user social information
CN112069404A (en) Commodity information display method, device, equipment and storage medium
CN116186301A (en) Multi-mode hierarchical graph-based multimedia recommendation method, electronic equipment and storage medium
CN112966513B (en) Method and apparatus for entity linking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination