CN113850656A

CN113850656A - Personalized clothing recommendation method and system based on attention perception and integrating multi-mode data

Info

Publication number: CN113850656A
Application number: CN202111348060.9A
Authority: CN
Inventors: 田保军; 康萌; 房建东
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2021-12-28
Anticipated expiration: 2041-11-15
Also published as: CN113850656B

Abstract

The invention discloses an attention-perception-based personalized clothing recommendation method and system fusing multi-modal data, wherein the grading data of a user is used for guiding the generation of comment features of the user, and the comment features processed by an attention mechanism contain comment information more relevant to the user; and guiding the generation of fine-grained feature vectors of the clothing images according to the processed comment features so as to obtain the key image features of the clothing concerned by the user, and obtaining more accurate clothing feature vectors which are more in line with the personalized preference of the user after two attention processes. According to the method, the attention mechanism is utilized to filter the noise data in the user comment and clothing image data, so that the multidimensional characteristics of the comment and clothing image more relevant to the user are obtained, the personalized preference of the user is more accurately reflected, and the problems of insufficient interest mining, insufficient recommendation precision and the like of the user are solved.

Description

Personalized clothing recommendation method and system based on attention perception and integrating multi-mode data

Technical Field

The invention belongs to the technical field of intelligent recommendation, and particularly relates to an attention-perception-based personalized clothing recommendation method and system fusing multi-modal data.

Background

With the rapid development of science and technology and the wide application of electronic commerce, various large electronic commerce platforms rise. However, information data is increasing, and internet data is growing explosively. The recommendation system is widely applied as an effective method for solving 'information overload', and personalized services of the recommendation system have penetrated into the lives of people and play more and more important roles.

The recommendation system aims to deeply analyze and mine factors such as characteristics, interests and the like of a user according to historical behavior information of the user and then match information or services which are possibly interested by the user from massive information. The most important characteristics of the method are that the method can fully adapt to the problem of user requirement ambiguity, and can utilize historical data of a user to build a model to capture the interest of the user. The personalized recommendation is to recommend specific commodities which are relatively more interested to a target user according to different personalized requirements. The most widely used recommendation algorithm in the clothing personalized recommendation technology is recommendation based on collaborative filtering, a preference function relation between a user and an article is mined according to historical interaction records of different users and the article of the user and information of similar users, the preference of the user to the article which has not generated interaction is predicted, and then the article which is possibly interested by the user is recommended for the different users according to the prediction result. However, the collaborative filtering algorithm may have the problems of data sparsity, incapability of sufficiently mining potential interests of users and the like, and secondly, the collaborative filtering algorithm based on the users only utilizes the interactive information of the users and the commodities and ignores the characteristics of the clothing products, such as the visual characteristics of the clothing. Finally, the recommendation result is not accurate, and the user experience is not high.

In summary, currently, the existing methods for recommending clothes are mainly divided into two categories:

1) although the method can produce a better recommendation effect, the collaborative filtering method mainly has the following defects: (1) clothing recommendation is performed only by using sparse scoring data, so that the problems of data sparsity, insufficient mining of potential interest of a user and the like are caused; (2) relevant information about the clothing item itself is ignored, for example: image features of the garment. Resulting in less accurate recommendation systems.

2) The recommendation system based on the clothing image features mainly has the following defects: (1) the global features of the clothing image are used as the feature representation of the image, and the clothing feature representation with fine granularity is lacked; (2) the user personalized preferences of different characteristics of the clothes concerned by different users are ignored, so that the personalized experience of the user recommending the result is poor.

Disclosure of Invention

The invention provides an individual clothing recommendation method and system based on attention perception and fusing multi-mode data, aiming at the problems that the existing clothing recommendation result is not accurate and the user experience degree is not high.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides an attention perception-based personalized clothing recommendation method fusing multi-modal data, which comprises the following steps:

step 1: extracting a hidden factor vector matrix of the user from the user scoring data by using the LFM;

step 2: mining feature information of the user comments by using a BiGRU and attention mechanism, wherein the feature information comprises a hidden factor vector matrix of a user for guiding the user comments to generate word-level attention vectors;

and step 3: applying BiGRU to splice hidden states in the forward direction and the backward direction on the attention vector of the word level to obtain a contextualized user comment feature vector, and then obtaining a user comment preference feature vector based on the contextualized user comment feature vector;

and 4, step 4: dividing the clothing image purchased by the user for the last time into m regions, extracting the clothing image feature vector of each region, and performing attention guidance on clothing image features by using the contextualized user comment vector to obtain the clothing image feature vector generated by the user comment guidance;

and 5: splicing the obtained user comment preference characteristic vector and the clothing image characteristic vector generated by the user comment guidance to generate a user preference characteristic vector which is used as the output of an encoder part;

step 6: the user preference feature vector is input to a decoder section, the probability distribution of the set of candidate apparel items is calculated, and the apparel item with the highest probability is selected as the next recommendation.

Further, the step 1 comprises:

and (3) constructing a grading matrix of the user-clothing item by taking the user as a row of the matrix and the clothing item as a column of the matrix:

wherein r is_u,iScoring the clothing item i for the user u; p is a radical of_u,kA k-dimensional hidden factor vector representing user u; q. q.s_i,kK-dimensional hidden factor vector representing clothing item i; f represents the dimension of the vector; p is a radical of_uA hidden factor vector matrix representing the user.

Further, the step 2 comprises:

setting the history comment set S of the user as S₁，S₂，…，S_n，…，S_NEvery comment in is represented as a combination of words t₁,t₂,…，t_|S|；

Using pre-trained BERT to perform embedded expression of vectors, and using BiGRU to process a word vector sequence of each comment;

splicing the hidden states of each word in the forward direction and the backward direction to obtain a context word vector, thereby obtaining a word sequence h_t；

Guiding the comments to generate word-level attention vectors by using the user hidden factor vectors obtained from the scores;

hiding the user factor vector matrix p_uAnd word sequence h_tAs input, attention processing is performed, and the calculation formula is:

wherein,

representing the attention degree of the user u to the word t; w₁,W₂,W₃Is the weight to learn; a is_kRepresenting the attention weight of the k word obtained by performing the normalization operation; a is a word-level attention vector, which is a summary of comment S.

Further, the step 3 comprises:

adopting attention to process contextualized user comment feature vector to generate user comment preference feature vector S_uThe formula for attention calculation is:

β_n＝W₅ tanh(W₄c_n+b₁)+b₂

wherein beta is_nIndicating the degree of attention of the nth comment obtained using the single-layer neural network; c. C_nCommenting the feature vector for the contextualized user; w₄，W₅Is a weight matrix; b₁，b₂Is a bias vector; g_nAnd representing the weight of the nth comment obtained after the normalization operation.

Further, the step 4 comprises:

dividing the clothing image purchased by the user for the last time into m areas;

performing feature extraction on each region by using a VGG network to obtain an original clothing image feature vector;

then, the contextualized user comment feature vectors obtained from the user comments are summarized into a single vector c by using average pooling_s；

The method comprises the following steps of performing attention processing on an original clothing image feature vector, filtering noise, and obtaining a clothing image feature vector generated by a user comment guidance, wherein the attention calculation formula is as follows:

δ_I＝tanh(W₆v_I⊙W₇c_S)

wherein delta_IDenotes c_sFor v_IDegree of attention of, v_I＝{v_i|v_i∈R^d，i＝1，…，m}，v_I∈R^d×mThe characteristic vector of the original clothing image is obtained; c. C_s∈R^d；W₆，W₇Is a weight matrix; an indication of a connection of a vector; p is a radical of_I∈R^mA vector representing m regions, corresponding to the probability of attention, p, for each region_iRepresents p_IThe vector of the corresponding ith area; v_LAnd guiding the generated clothing image feature vector for user comment.

The invention provides a personalized clothing recommendation system fusing multi-mode data and based on attention perception, which comprises the following components:

the hidden factor vector extraction module is used for extracting a hidden factor vector matrix of the user from the user scoring data by using the LFM;

the user comment feature extraction module is used for mining feature information of the user comment by using a BiGRU and attention mechanism, and comprises the steps of guiding the user comment to generate a word-level attention vector by using a hidden factor vector matrix of the user;

the user comment preference feature vector obtaining module is used for applying BiGRU to splice hidden states in the forward direction and the backward direction on the attention vector of the word level to obtain a contextualized user comment feature vector, and then obtaining the user comment preference feature vector based on the contextualized user comment feature vector;

the user comment guidance module is used for dividing the clothing image purchased by the user for the last time into m regions, extracting the clothing image feature vector of each region, and performing attention guidance on clothing image features by using the contextualized user comment vector to obtain the clothing image feature vector generated by the user comment guidance;

the user preference feature vector obtaining module is used for splicing the obtained user comment preference feature vector and the clothing image feature vector generated by the user comment guidance to generate a user preference feature vector which is used as the output of the encoder part;

and the recommending module is used for inputting the user preference feature vector into the decoder part, calculating the probability distribution of the candidate clothing item set and selecting the clothing item with the maximum probability as the next item recommendation.

Further, the implicit factor vector extraction module is specifically configured to:

Further, the user comment feature extraction module is specifically configured to:

rating the history of the userArgument S ═ S₁，S₂，…，S_n，…，S_NEvery comment in is represented as a combination of words t₁,t₂,…，t_|S|；

wherein,

Further, the user comment preference feature vector derivation module is specifically configured to:

adopting attention to process contextualized user comment feature vector to generate user comment preference feature vector S_uAttention meterThe formula of the calculation is as follows:

β_n＝W₅ tanh(W₄c_n+b₁)+b₂

Further, the user comment guidance module is specifically configured to:

δ_I＝tanh(W₆v_I⊙W₇c_S)

Compared with the prior art, the invention has the following beneficial effects:

compared with the prior recommendation system/method, the method disclosed by the invention has the advantages that the multi-mode data is fused for recommendation, the user score, the user comment and the clothing image are specifically used as the input of the model, the user score data is used for carrying out attention processing on the user comment, and the more accurate user comment characteristic is obtained. The features are extracted after the clothing image is segmented, so that the features of different parts of the image can be represented in a finer granularity, and more accurate feature vectors in a finer granularity are obtained. Therefore, multi-source data are fused for recommendation, and the problem of data sparsity in a clothing recommendation system is solved.

Compared with the existing recommendation system/method that vector splicing operation is simply carried out between multi-modal data or fusion is simply carried out by using a full connection layer, the method carries out attention processing of mutual guidance between the multi-modal data, uses user scores to guide user comments to carry out attention processing, and then uses the user comments to guide clothing images to carry out attention processing. According to the method, the attention mechanism is utilized to filter the noise data in the user comment and clothing image data, so that the multidimensional characteristics of the comment and clothing image more relevant to the user are obtained, the personalized preference of the user is more accurately reflected, and the problems of insufficient interest mining, insufficient recommendation precision and the like of the user are solved.

Drawings

FIG. 1 is a basic flowchart of a personalized clothing recommendation method based on attention perception by fusing multi-modal data according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an architecture of a personalized clothing recommendation system based on attention perception and integrating multimodal data according to an embodiment of the invention.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the accompanying drawings:

as shown in fig. 1, a personalized clothing recommendation method based on attention perception by fusing multi-modal data includes:

step 1: extracting a hidden factor vector matrix of the user from the user scoring data by using an LFM (hidden semantic model);

specifically, a user implicit factor vector and a clothing implicit factor vector are extracted from user scoring data by using the LFM, and the user implicit factor vector and the clothing implicit factor vector reflect the interests of a user. Let U be set as U ═ U₁，u₂，…，u_mThe clothing item set is I ═ I₁，i₂，…，i_nAnd constructing a grading matrix R of the user-clothing items by taking the user as a row of the matrix and the clothing items as columns of the matrix, wherein R_u,iAnd scoring the clothing item i for the user u according to the formula:

wherein p is_u,kK-dimensional hidden factor vector, q, representing user u_i,kK-dimensional hidden factor vector representing item of clothing i, and F represents the dimension of the vector. Finally, a hidden factor vector matrix p of the user representing the preference of the user is obtained_u。

Step 2: mining feature information of the user comments by using a BiGRU and an attention mechanism (specifically, processing comment texts by using the BiGRU to obtain user comment features containing context information, and then obtaining more relevant user comment features by using the attention mechanism), wherein the method comprises the steps of using a hidden factor vector matrix of a user to guide the user comments to generate word-level attention vectors;

in particular, the BiGRU and the attention mechanism are used for deeply mining the feature information of the user comment, and the comment text of the user contains important preference information and specific detailed information related to clothing. When extracting information from comments, it is important to distinguish relevant comments from noise comments, determine important parts in each comment, and set S ═ S of user' S historical comments₁，S₂，…，S_n，…，S_NEvery comment in is represented as a combination of words t₁,t₂,…， t_|S|. The embedded representation of the vectors was first performed using pre-trained BERT, and the word vector sequence for each comment was processed using BiGRU in order to capture context information. Splicing the hidden states of each word in the forward direction and the backward direction to obtain a context word vector, thereby obtaining a word sequence h_t. Because the word sequence of the text is long, the final feature vector obtained by the BiGRU model in practical application is biased to the last words of the text sequence. In order to solve the problem, an attention mechanism is adopted to guide the comment to generate a word-level attention vector by using a user hidden factor vector obtained in the grading, and the weight of the word vector can be adjusted by fully utilizing the preference of a user so as to obtain more compact and more accurate comment content feature representation. Hiding the user factor vector matrix p_uAnd word sequence h_tAs input, attention processing is performed, and the calculation formula is:

wherein,

to representUsing w₁And w₂To p_uAnd h_tPerforming linear conversion, extracting nonlinear semantic information by using a nonlinear activation function, and finally obtaining the attention degree of the user u to the word t by using the linear conversion; w₁,W₂,W₃Is the weight to learn; a is_kRepresenting the attention weight of the k word obtained by performing the normalization operation; a is a word-level attention vector, which is a summary of comment S. For each comment S₁，…， S_NRepeatedly calculating to obtain a₁，…，a_N。

specifically, in order to capture global context information in user comments, BiGRU is applied to a word-level attention vector, and hidden states in the forward direction and the backward direction are spliced to obtain a contextualized user comment vector c₁，…，c_n，…，c_N. Attention processing is adopted to pay attention to important comment information, and a final user comment preference feature vector S is generated_uThe formula for attention calculation is:

β_n＝W₅ tanh(W₄c_n+b₁)+b₂

wherein, beta_nIndicating the degree of attention of the nth comment obtained using the single-layer neural network; w₄，W₅Is a weight matrix; b₁，b₂Is a bias vector; g_nRepresenting the weight of the nth comment obtained after the normalization operationAnd (4) heavy.

specifically, when a clothing image is processed, in general, the user's attention may be related to only a specific region of the input clothing image. Therefore, instead of using the global vector as an image feature, the image is divided into m regions and a feature vector of each region is extracted. The scoring guided user comment features are then used to focus on the garment image features, filter noise and find areas more relevant to the user's preferences. First, the garment image size is scaled to 224 × 224, divided into m × N regions, and feature extraction is performed for each region using a preprocessed 19-layer VGG network. Obtaining the original clothing image feature vector v_I＝{v_i|v_i∈R^dI is 1, …, m }. Then obtaining contextualized user comment vector c from the user comment₁，…，c_n，…， c_NThe pools were aggregated into a single vector using averaging.

Wherein c is_sIndicates the use of c₁，…，c_n，…，c_NAnd averaging pooled single vectors after aggregation.

The original clothing image feature vectors are subjected to attention processing, noise is filtered to obtain features highlighting the clothing image key areas, and for calculation convenience, a full connection layer is used for converting each image feature vector into an image feature vector with the dimension equal to the comment vector. The attention calculation formula is:

δ_I＝tanh(W₆v_I⊙W₇c_S)

wherein, delta_IDenotes c_sFor v_IDegree of attention of, v_I∈R^d×m，c_s∈R^d；W₆，W₇Is a weight matrix; an indication of a connection of a vector; p is a radical of_I∈R^mA vector representing m regions, corresponding to the probability of attention, p, for each region_iRepresents p_IThe vector of the corresponding ith area; v_LAnd guiding the generated clothing image feature vector for user comment.

And 5: splicing the obtained user comment preference characteristic vector and the clothing image characteristic vector generated by the user comment guidance to generate a user preference characteristic vector which is used as an Encoder (Encoder) part for output;

and splicing the obtained user comment features and the clothing image features generated by the user comment guidance to generate a user preference feature vector, and outputting the user preference feature vector as an Encoder part of the network.

Step 6: the user preference feature vector is input to a Decoder (Decoder) section, the probability distribution of the set of candidate apparel items is computed, and the apparel item with the highest probability is selected as the next recommendation.

The Decoder part is mainly a GRU network, and during training, a user preference feature vector and a clothing item sequence are used for training, and clothing items are mapped into features with fixed lengths to be used as GRU input.

x_t＝W₈I_t,t∈{1,…,n}

h_t+1＝GRU(x_t),t∈{1,…,n}

Wherein I ═ I (I)₁，…，I_n) Is a sequence of clothing items, each item represented as a one-hot vector H_t，W₈Is a weight vector, x_tIs the clothing item embedding feature at time t; h is_t+1And representing the hidden state of the GRU model output at the t +1 moment in the training process.

In prediction, the previous output h is given_t-1Generating the next output h through GRU_t. And at each time step, generating the probability distribution of each clothing item at the time t by adopting a single-layer full-connection network and a softmax function, and finally selecting the item with the highest probability as the next item recommendation.

P_t＝softmax(W_ih_s)

Wherein, P_tProbability distribution, h, for each item of clothing generated at time t_s∈{h₁，…，h_nIs input, W_iIs a weight parameter.

The objective function of the model training is:

wherein H_tIs the true tag at time t, H₀Is the output value of the Encoder part (i.e. the user preference feature vector), H_{1:t-1}Is the previous sequence of items, theta represents all the parameters of the model, and lambda is the regularization parameter. And (4) optimizing an objective function by adopting random gradient descent (SGD), and randomly selecting a training example each time to update the model parameters towards the direction of negative gradient.

On the basis of the above embodiment, as shown in fig. 2, the present invention further provides an attention-aware-based personalized clothing recommendation system fusing multimodal data, including:

wherein,

β_n＝W₅ tanh(W₄c_n+b₁)+b₂

Further, the user comment guidance module is specifically configured to:

δ_I＝tanh(W₆v_I⊙W₇c_S)

In summary, firstly, aiming at the problem that the precision of a recommendation system is not high due to data sparsity existing in a matrix decomposition-based collaborative filtering algorithm, the method and the device jointly use the user scoring data and the comment information to guide a model to learn more reasonable user characteristics in a mode of adding more data, and further improve the prediction precision of the model. Compared with single scoring data, the text comment information generated by the user is an important source of user preference characteristics and contains more specific and subtle characteristics about the user preferences. Therefore, the precision of the recommendation system can be effectively improved by fusing the comment data. Secondly, the fine-grained clothing image features are used as the representation of the semantic information of clothing commodities, more accurate influence can be generated on the purchase intention of a user, and the method for representing the image features by using the whole clothing image is replaced. Finally, for the user comment information and the clothing image, if the clothing recommendation is performed by directly using the comment text extracted by the bidirectional GRU model or using the VGG model to capture the overall vector characteristics of the clothing image, the final recommendation result may be affected by noise data therein. The present invention introduces an attention mechanism that allows the model to focus more on information related to the user's features. Firstly, the generation of comment features of a user is guided by using the grading data of the user, the user grading explicitly expresses the preference of the user, and the weight of a word vector in each comment can be adjusted by fully utilizing the preference of the user so as to obtain more accurate comment feature representation. Because the user comment may contain intuitive words capable of expressing user preferences, which indicate that the user focuses more on information of a certain local area on the clothing image, the comment features processed by the attention mechanism are used for guiding the generation of clothing image features, and clothing vector features more relevant to user preferences are obtained. Therefore, user comment and clothing image vector features which are more accurate and accord with user personalized preferences can be obtained from the user comment and clothing image respectively. After the user comment feature vector and the clothing image feature vector are obtained, the user comment feature vector and the clothing image feature vector are spliced and a GRU model is used for carrying out serialized recommendation, and therefore the next personalized recommendation is carried out on the user.

Compared with the prior recommendation system method, the method disclosed by the invention has the advantages that the multi-mode data are fused for recommendation, the user score, the user comment and the clothing image are specifically used as the input of the model, the user score data is used for carrying out attention processing on the user comment, and the more accurate user comment characteristic is obtained. The features are extracted after the clothing image is segmented, so that the features of different parts of the image can be represented in a finer granularity, and more accurate feature vectors in a finer granularity are obtained. Therefore, multi-source data are fused for recommendation, and the problem of data sparsity in a clothing recommendation system is solved.

Compared with the existing recommendation system and method in which vector splicing operation is simply carried out between multi-modal data or fusion is simply carried out by using a full connection layer, the method carries out attention processing of mutual guidance between the multi-modal data, uses user scores to guide user comments to carry out attention processing, and then uses the user comments to guide clothing images to carry out attention processing. According to the method, the attention mechanism is utilized to filter the noise data in the user comment and clothing image data, so that the multidimensional characteristics of the comment and clothing image more relevant to the user are obtained, the personalized preference of the user is more accurately reflected, and the problems of insufficient interest mining, insufficient recommendation precision and the like of the user are solved.

The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A personalized clothing recommendation method based on attention perception and fusing multi-modal data is characterized by comprising the following steps:

2. The personalized clothing recommendation method based on attention perception fusing multi-modal data as claimed in claim 1, wherein the step 1 comprises:

3. The personalized clothing recommendation method based on attention perception fusing multi-modal data as claimed in claim 1, wherein the step 2 comprises:

wherein,

4. The personalized clothing recommendation method based on attention perception fusing multi-modal data as claimed in claim 1, wherein the step 3 comprises:

β_n＝W₅ tanh(W₄c_n+b₁)+b₂

5. The personalized clothing recommendation method based on attention perception fusing multi-modal data as claimed in claim 1, wherein the step 4 comprises:

δ_I＝tanh(W₆v_I⊙W₇c_S)

6. An attention-aware-based personalized garment recommendation system fusing multimodal data, comprising: