WO2022231522A1 - Explainable recommendation system and method - Google Patents
Explainable recommendation system and method Download PDFInfo
- Publication number
- WO2022231522A1 WO2022231522A1 PCT/SG2022/050256 SG2022050256W WO2022231522A1 WO 2022231522 A1 WO2022231522 A1 WO 2022231522A1 SG 2022050256 W SG2022050256 W SG 2022050256W WO 2022231522 A1 WO2022231522 A1 WO 2022231522A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- item
- user
- hidden state
- predicted
- historical
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012552 review Methods 0.000 claims abstract description 58
- 238000010801 machine learning Methods 0.000 claims abstract description 35
- 238000011176 pooling Methods 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims description 47
- 238000012545 processing Methods 0.000 claims description 7
- 230000002457 bidirectional effect Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 17
- 238000011156 evaluation Methods 0.000 description 6
- YTAHJIFKAKIKAV-XNMGPUDCSA-N [(1R)-3-morpholin-4-yl-1-phenylpropyl] N-[(3S)-2-oxo-5-phenyl-1,3-dihydro-1,4-benzodiazepin-3-yl]carbamate Chemical compound O=C1[C@H](N=C(C2=C(N1)C=CC=C2)C1=CC=CC=C1)NC(O[C@H](CCN1CCOCC1)C1=CC=CC=C1)=O YTAHJIFKAKIKAV-XNMGPUDCSA-N 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000004931 aggregating effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/045—Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0282—Rating or review of business operators or products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- the present disclosure generally relates to an explainable recommendation system and method. More particularly, the present disclosure describes various embodiments of a computer system and a computerized method for recommending an item to a user.
- Recommendation systems have been widely used to help users make decisions by suggesting to them items or products that they may be interested. Although some existing recommendation methods usually achieve satisfactory performances, it is still difficult to explain their recommendations. Current methods for explainable recommendation can be roughly classified into two groups: template-based and natural language generation-based.
- the template-based methods generate explanations by filling the generated words in a predefined sentence template with different words for different users. For example, in the template “You might be interested in [aspect], on which this product performs welt’, the [ aspect ] can be replaced by a generated aspect to produce an explanation for item recommendation.
- explanations from the template-based methods may be uninformative and unpersuasive.
- designing high-quality templates is time-consuming and usually requires domain knowledge.
- the natural language generation-based methods can generate more natural and flexible sentences. However, some of these methods can only generate short recommendations based on given attributes such as user identity, item identity and rating value. It is difficult for them to generate reliable and precise explanations due to lack of other guiding information or generative signals.
- Embodiments of the present disclosure relate to an explainable recommendation system and method, more specifically a computer system and a computerized method for recommending an item to a user.
- the method includes: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hier
- Figure 1 illustrates an explainable recommendation system according to embodiments of the present disclosure.
- Figures 2A and 2B are illustrations of dependency tree structures of two reviews and a semantic graph in the explainable recommendation system.
- Figures 3A and 3B illustrate graph pooling in the explainable recommendation system.
- Figures 4A to 4D illustrate performance evaluations of the explainable recommendation system.
- depiction of a given element or consideration or use of a particular element number in a particular figure or a reference thereto in corresponding descriptive material can encompass the same, an equivalent, or an analogous element or element number identified in another figure or descriptive material associated therewith.
- references to “an embodiment / example”, “another embodiment / example”, “some embodiments / examples”, “some other embodiments / examples”, and so on, indicate that the embodiment(s) / example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment / example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in an embodiment / example” or “in another embodiment / example” does not necessarily refer to the same embodiment / example.
- the terms “a” and “an” are defined as one or more than one.
- the use of 7” in a figure or associated text is understood to mean “and/or” unless otherwise indicated.
- the term “set” is defined as a non-empty finite organization of elements that mathematically exhibits a cardinality of at least one (e.g. a set as defined herein can correspond to a unit, singlet, or single-element set, or a multiple-element set), in accordance with known mathematical definitions.
- the recitation of a particular numerical value or value range herein is understood to include or be a recitation of an approximate numerical value or value range.
- the terms “first”, “second”, etc. are used merely as labels or identifiers and are not intended to impose numerical requirements on their associated terms.
- the explainable recommendation system is illustrated as a computer system 100 for recommending an item i to a user u and generating an explainable recommendation for recommending the item i to the user u.
- the item i may be a product such as a book or may be a service such as streaming content.
- the computer system 100 includes a database 110 storing historical reviews by the user u and historical reviews of the item i.
- the database 110 also stores historical reviews by other users and historical reviews of other items, as well as identifiers of the users and items.
- the historical user reviews are reviews associated with the user u, such as reviews of various items written by the user u in the past.
- the historical item reviews are reviews associated with the item i, such as reviews of the item i written by various users in the past.
- the historical user reviews are denoted by where n du denotes the number of historical reviews associated with the user u.
- the historical item reviews are denoted where n d. denotes the number of historical reviews associated with the item i.
- the computer system 100 includes a graph generation module 120 configured for generating a historical user semantic graph from the historical user reviews V u and for generating a historical item semantic graph from the historical item reviews ⁇ )
- the semantic graphs can provide a better understanding of the user and item details based on the historical reviews and reduce the impacts caused by noise reviews.
- the historical user semantic graph is denoted by , where X u denotes the set of nodes (e.g.
- £ u denotes the set of edges
- the historical item semantic graph is denoted by where X j denotes the set of nodes and £ denotes the set of edges
- Si ⁇ (x h ,r, x t ) ⁇ x h , x t E Xi, r E tR ⁇ .
- r denotes the relation connecting the two nodes, and (R denotes the set of all possible relations.
- the graph generation module 120 may perform some processing steps in generating the historical user semantic graph Q u and the historical item semantic graph Qi .
- text pre-processing techniques such as tokenization and spelling corrections, may be applied on each review.
- Dependency parsing is then used to automatically generate a constituent-based representation, such as a syntax dependency tree, for each review sentence based on syntax.
- the relations in the dependency tree may provide important clues to mine aspects, details, and opinions.
- Figure 2A illustrates the dependency tree structures of two exemplary reviews. In the first review, there is a sub-tree: story -> ( nmod:width ) -> twists -> ( amod ) -> interesting.
- Pruning may also be performed to remove words with little semantic information.
- the relation “def between “story’ and its determiner “a” has little semantic information and can be removed, keeping only the head and tail nodes.
- relations such as “ nmod.poss” and “puncf can be removed.
- the dependency trees of the review sentences are connected by replacing the same words in different dependency trees with the same node, thereby aggregating different reviews and generating the historical user semantic graph Q u and the historical item semantic graph
- details about “love story”, such as “characters” and “twists” can be directly connected to “ love story” in the semantic graph.
- the computer system 100 includes an aspect extraction module 122 configured for extracting a set of historical user aspects of the user u from the historical user reviews T> u and for extracting a set of historical item aspects of the item i from the historical item reviews D j .
- Aspects usually represent features of items, such as price of a product ⁇ “price”), category of a product, property / characteristic of a product.
- an aspect can be a genre of a book, such as “romance” or “mystery”.
- Historical user aspects are extracted from the historical user reviews T) u and denoted by L u
- historical item aspects are extracted from the historical item reviews D j and denoted
- the item aspects are sorted according to their occurrence frequency in descending order, and the top n ranked item aspects are chosen, , hence generating the set of historical item aspects as denoted by Qi — q ⁇ , qf, qf, ... , qf ⁇ .
- the computer system 100 includes a set of embedding layers 130 configured for embedding or encoding one or more of a user identifier of the user u, an item identifier of the item i, the set of historical user aspects Q u , the set of historical item aspects Qi , the historical user semantic graph Q u , and the historical item semantic graph
- the embedding layers 130 encodes nodes in the graphs using one-hot encoding.
- the computer system 100 includes a graph representation learning module 140 configured for constructing a hierarchy of semantic graphs from the historical semantic graphs using graph pooling. As the historical semantic graphs are generated from historical reviews, the graph representation learning module 140 may be referred to as a review-based graph representation learning (RGRL) module.
- the graph representation learning module 140 includes an aspect-guided graph pooling (AGP) operator 150 for performing the graph pooling to extract aspect-specific knowledge from the historical semantic graphs based on the historical aspects.
- AGP aspect-guided graph pooling
- X and 8 denote the set of nodes and edges in the graph Q, respectively.
- X is the node feature matrix and A is the adjacency matrix.
- f(q ) is the representation of the aspect q obtained from the embedding layers 130.
- a Bidirectional Long Short-Term Memory (BiLSTM) model is used to encode the word embedding of the aspect q and obtain the backward hidden state as the representation /(q).
- a Graph Attention Network (GAT) may be used to encode the graph Q for input into the AGP operator 150.
- the feature of x h is updated by aggregating the input features of neighbourhood nodes and adding its input feature x h by self-loop as follows.
- j s th e weight matrix
- a(x h ,r, x t ) denotes the attention score between two nodes x h and x t.
- the attention score can be defined as follows. can be implemented by the following attentional mechanism.
- LeakyRELU is the Leaky Rectified Linear Unit (ReLU) activation function
- r is the embedding of the relation r, and are the weight matrices.
- a matrix is used to denote the updated features of all nodes.
- An aspect-aware importance score is defined as follows to describe the relevance of each node in the graph Q to the given aspect q.
- I 1 I denotes the absolute value function.
- the nodes in the graph can be ranked according to the importance scores in descending order.
- the set of top K ranked nodes is denoted by and their indices by idx(g). K is empirically set as l, where p is the pooling ratio,
- the new features of nodes in and adjacency matrix of the corresponding graph are defined as follows. and are the weight matrix and bias vector, respectively. is the row-wise indexed feature matrix. aims to get the row-wise and column-wise indexed adjacency matrix from A. Further, and are the new feature matrix and the corresponding adjacency matrix, respectively, after pooling. £ is used to denote the set of edges that describes the connecting relationships between the nodes in .
- the output graph of the AGP operator 150 is denoted by The graph representation learning module 140 is configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of user semantic graphs from the historical user semantic graph Q u and the historical user aspects Q u.
- L layers of graph pooling are conducted on Q u and guided by Q u.
- the input graph of the k- th AGP operator 150 at the /-th layer is The representation °f the fc-th aspect is used to guide the graph pooling in the fc-th AGP operator 150.
- a BiLSTM model in the embedding layers 130 is used to encode the k- th aspect and the backward hidden state is obtained as the representation
- the output graph of the k -th AGP operator 150 at the Z -th layer is which will be used as the input graph of the Zc-th AGP operator 150 at the (Z + l)-th layer.
- the input graph of the k- th AGP operator 150 at the (Z + l)-th layer is where .
- the input graph for k 1,2,3, ... , n.
- a pooling operation such as maximum pooling is performed on to obtain the aspect-aware graph representation at the Z-th layer.
- L times corresponding to the L layers multiple representations of g that are relevant to the user aspect can be obtained, i.e.
- the graph representations are concatenated to fuse them from fine-grained to coarse, thereby forming the graph representation of the historical user semantic graph g u as follows.
- the graph representation learning module 140 is also configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects Q It will be appreciated that various parts of the description above for constructing the hierarchy of user semantic graphs from the historical user semantic graph Q u and the historical user aspects Q u apply equally to constructing the hierarchy of item semantic graphs and are omitted for purpose of brevity. Accordingly, the hierarchy of item semantic graphs S j is defined as follows.
- the computer system 100 includes an aspect matching module 160 configured for generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects.
- the set of matched aspects may include a user preference vector v u that matches every user aspect of the user u to one or more relevant item aspects, thus aiming to find user aspects that the user u is interested or prefers in the item i.
- the set of matched aspects may include an item preference vector V j that matches every item aspect of the item i to one or more relevant user aspects, thus aiming to find item aspects that are highly relevant to the user u.
- the user preference vector v u and the item preference vector n ⁇ are defined as follows. denotes the mean pooling operation and are trainable parameters M s is the aspect level importance weight matrix and is defined as follows. is a learnable weight matrix and . In
- Each element M s [x,y] describes the importance of the y-th item aspect to the x-th user aspect.
- the aspect level information of the user w and item i are fused with the aspect level importance weight matrix M s to obtain the user preference vector v u and the item preference vector V j described above.
- the computer system 100 includes an aspect processing module configured for generating, from the matched aspects, aspect input data associated with the user’s interest in the item.
- the aspect processing module is configured for calculating a rating score from the matched aspects and the aspect input data may include the rating score.
- the rating score represents the user u’s interest in or preference on the item i .
- the user identifier of the user u is embedded by the embedding layers 130 to form a user identifier representation e u.
- the user identifier representation e u is input into a Multi-Layer Perceptron (MLP) model 170 and the output is concatenated with the user preference vector v u to obtain the final representation x u of the user u.
- MLP Multi-Layer Perceptron
- the final representation x £ of the item i can be obtained in a similar manner.
- the final representations x u and X j are defined as follows. are learnable parameters.
- the final representations x u and X j are concatenated to form x.
- the aspect processing module may include a factorization machine model 180 configured for calculating the rating score f ui.
- the rating score r ui represents the user u’s preference on the item i and is defined as follows.
- b 0 , b u , and b ⁇ are the global bias, user bias, and item bias, respectively j S C0e ffj Cjen t vector.
- Zj and Zj are the i-th and j -th dimensions of trainable parameters of the factorization machine model 180, respectively. denotes the dot product of two vectors.
- X and j are the i-th and j-th dimensions of x, respectively.
- the computer system 100 includes a first machine learning model 200 configured for determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of the first machine learning model 200.
- the first machine learning model 200 may be referred as the aspect generation module.
- the first machine learning model 200 may include a first LSTM (Long Short-Term Memory) model, wherein the predicted aspects are determined at each time step of the first LSTM model. Determining the predicted aspects includes determining a current predicted aspect at a current time step. Determining the current predicted aspect includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step, and determining the current predicted aspect from the current hidden state.
- LSTM Long Short-Term Memory
- the previous hidden state may be an initial hidden state at an initial time step of the first LSTM model.
- the initial hidden state is calculated from the aspect input data which includes the rating score .
- the rating score r ui is mapped into a sentiment representation v r to guide the aspect and explanation generation, defined as follows. and are the trainable weight matrix and the bias vector, respectively.
- the aspect input data including the sentiment representation v r , user preference vector v u , and item preference vector V j are input into an MLP model to calculate the initial hidden state , defined as follows.
- the item attention vector can similarly be obtained from The current hidden state at the current time step j as follows.
- the user attention vector ' s calculated from the previous hidden state and the node features derived from the hierarchy of user semantic graphs.
- the item attention vector is calculated from the previous hidden state and the node features Xj derived from the hierarchy of item semantic graphs.
- An attention vector at the previous time step is calculated by concatenating the user attention vector and item attention vector .
- the attention vector is then input into LSTM model as shown in the equations above for calculating the current hidden state
- the first machine learning model 200 includes an MLP model wherein the current predicted aspect is determined from the current hidden state h“ using the MLP model.
- the current hidden state h“ is input into the MLP model to obtain the probability distribution of the current predicted aspect wj 1 at the current time step j, as follows. is a trainable weight parameter, d a is the size of the vocabulary of predicted aspects, and b a is the bias vector.
- the first hidden state at the start of sequence ⁇ SOS> is generated from the initial hidden state and the first predicted aspect at the first hidden state is story.
- the second hidden state is generated from the first hidden state and the first predicted aspect ⁇ story).
- the second predicted aspect at the second hidden state is characters.
- the third hidden state at the end of sequence ⁇ EOS> is generated from the second hidden state and the second predicted aspect ⁇ characters).
- the computer system 100 includes a second machine learning model 220 configured for determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of the second machine learning model 220, the predicted words for explaining the respective predicted aspect. For example, the predicted words form a sentence that explains the predicted aspect.
- the second machine learning model 220 may be referred as the explanation generation module.
- the second machine learning model 220 may include a second LSTM model, wherein the predicted words for each predicted aspect are determined at each time step of the second LSTM model. For each predicted aspect, determining the predicted words includes determining a current predicted word at a current time step. Determining the current predicted word includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step, and determining the current predicted word from the current hidden state.
- the previous hidden state may be an initial hidden state at an initial time step of the second LSTM model.
- the y ' -th hidden state from the first machine learning model 200 is used as the initial hidden state
- the user attention vector a nc * ' tem attent i° n vector c are calculated from the previous hidden state as well as the hierarchy of user semantic graphs and hierarchy of item semantic graphs, respectively. More specifically, the hidden state at each time step t — 1, where t — 1,2,3, ..., is incorporated with X u and X t to calculate the user attention vector and item attention vector respectively.
- An attention vector at the previous time step is calculated by concatenating the user attention vector and item attention vector wherein the attention vector is used for calculating the current hidden state h t . More specifically, the current hidden state at the current time step t can be calculated as follows. is the embedding of the previous predicted word
- the second machine learning model 220 includes an MLP model wherein the current predicted word is determined from the current hidden state using the MLP model.
- the current hidden state is input into the MLP model to obtain the probability distribution of the current predicted word at the current time step t, as follows. is a trainable weight parameter, is the size of the vocabulary of predicted words, and is the bias vector.
- the predicted words for each predicted aspect are determined as follows.
- the second predicted aspect ( characters ) at the second hidden state h 2 from the first machine learning model 220 is used in this example.
- the second hidden state h 2 is set as the initial hidden state in the second machine learning model 220.
- the first hidden state h 2 1 at the ⁇ SOS> is generated from the initial hidden state h 2 and the first predicted word at the first hidden state h 2 4 is the.
- the second hidden state 2 is generated from the first hidden state 1 and the first predicted word (the).
- the second predicted word at the second hidden state h 2 2 is are.
- the third hidden state h 2 3 is generated from the second hidden state h 2 2 and the second predicted word (are).
- the third predicted word at the third hidden state h 2 3 is wonderful.
- the fourth hidden state h 2 4 at the end of sequence ⁇ EOS> is generated from the third hidden state h 2 3 and the third predicted word (wonderful).
- the computer system 100 includes a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
- a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
- the explainable recommendation for the predicted aspect characters is the sentence “the characters are wonderful' formed by the predicted words.
- Embodiments of the present disclosure also describe a computerized method for recommending an item i to a user it.
- the computerized method may be implemented on the computer system 100 which includes the various components described above as well as one or more processors configured for performing various steps of the computerized method in response to non-transitory instructions operative or executed by the processors.
- the non-transitory instructions are stored on a memory of the computer system and may be referred to as computer-readable storage media and/or non-transitory computer-readable media.
- Non-transitory computer-readable media include all computer-readable media, with the sole exception being a transitory propagating signal per se.
- the computerized method includes steps of: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect
- the computer system 100 / computerized method described in various embodiments herein provides an explainable recommendation system / method that employs review-based semantic graphs to include more user and item details for generating informative explanations covering multiple aspects. Moreover, the recommendation system extracts aspect-relevant knowledge from the semantic graphs to learn the user’s preferences on different item aspects.
- the recommendation system which may be referred to as the Hierarchical Aspect-guided Review generation (HARE) recommendation system, can thus generate high quality and informative explanations to explain the recommendation of items to the user.
- the explainable recommendation system not only recommends new items but also generates intuitive explanations which can help users to make better and quicker decisions. For example, users may be more persuaded by the explanation to purchase recommended items.
- the explainable recommendation system was tested using three real-world datasets to evaluate the explanation generation performance and the preference prediction performance.
- the first and second datasets were derived from Amazon® Review Data containing product reviews from Amazon®.
- the first subset was derived from the Kindle Store subset (“Kindle”), and the second dataset was derived from the Electronics subset (“Electronics”).
- the third dataset was derived from the Yelp® Challenge 2019 dataset (“Yelp”).
- a record includes a user identifier, an item identifier, an overall rating, and a textual review.
- Aspects were extracted from the reviews and records with only one aspect were excluded.
- Aspect-relevant sentences were extracted from the reviews as explanations.
- Figure 4A shows the statistics of the Kindle, Electronics, and Yelp datasets. Each dataset was randomly split the data by the ratio 8:1 :1 as training, validation, and test data for the recommendation system.
- Att2Seq incorporates the Seq2Seq model [Sutskever2014Sequence] and attention mechanism to learn the user’s preference from the user attributes and generate review explanations.
- ExpNet utilizes an encoder- decoder framework to expand a short phrase to a long review by combining the user and item information with other auxiliary information.
- Ref2Seq follows the structure of Seq2Seq and learns the representation from the user and item reviews to generate explanations.
- NETE-PMI adopts MLP to predict the rating and then generates a template-controlled sentence with a single predicted aspect.
- ACF uses MLP to encode different attributes and applies a coarse-to-fine decoding model to generate long reviews.
- BLEU bilingual evaluation understudy
- ROUGE Recall-Oriented Understudy for Gisting Evaluation
- METEOR Methodric for Evaluation of Translation with Explicit Ordering
- Figures 4B and 4C show the performance of the single-aspect and multi-aspect explanation generation tasks achieved by the different methods. Larger BLEU, ROUGE, METEOR, and FMR values indicate better results for the explanation generation performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best explanation generation performance on most metrics compared to the other methods.
- the HARE recommendation system was compared against other prediction methods - PMF, SVD++, CARL, RMG, and NETE-PMI - to evaluate the ability in predicting users’ preferences based on the rating scores.
- PMF is the probabilistic matrix factorization method developed for rating prediction.
- SVD++ exploits both the user’s preferences on items and the influences between items for recommendation.
- CARL uses CNNs to learn relevant aspects from the review data.
- RMG uses a multi-view learning framework to incorporate the review contents and the users’ rating behaviours for the recommendation.
- NETE-PMI feeds user and item identifiers to an MLP to predict rating scores.
- MAE Mean Absolute Error
- T denotes the set of test data
- r ui denotes the predicted rating score
- r ui denotes the rating score in the test data
- Figure 4D shows the performance of the preference prediction tasks achieved by the different methods. Lower MAE values indicate better results for the preference prediction performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best preference prediction performance for the Kindle and Yelp datasets, and the second- best prediction performance for the Electronics dataset. These results indicate that the user preference predicted by HARE can support the high-quality explanation generation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Medical Informatics (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure generally relates to an explainable recommendation system and method. The method includes: generating, from historical reviews by the user and of the item, historical user and item semantic graphs and historical user and item; constructing, using graph pooling, hierarchies of user and item semantic graphs from the historical user and item semantic graphs and the historical user and item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user's interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, predicted aspects using a first machine learning model; determining, from the hierarchies of user and item semantic graphs and the predicted aspects, predicted words using a second machine learning model; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
Description
EXPLAINABLE RECOMMENDATION SYSTEM AND METHOD
Cross Reference to Related Application(s)
The present disclosure claims the benefit of Singapore Patent Application No. 10202104385T filed on 28 April 2021 , which is incorporated in its entirety by reference herein.
Technical Field
The present disclosure generally relates to an explainable recommendation system and method. More particularly, the present disclosure describes various embodiments of a computer system and a computerized method for recommending an item to a user.
Background
Recommendation systems have been widely used to help users make decisions by suggesting to them items or products that they may be interested. Although some existing recommendation methods usually achieve satisfactory performances, it is still difficult to explain their recommendations. Current methods for explainable recommendation can be roughly classified into two groups: template-based and natural language generation-based.
The template-based methods generate explanations by filling the generated words in a predefined sentence template with different words for different users. For example, in the template “You might be interested in [aspect], on which this product performs welt’, the [ aspect ] can be replaced by a generated aspect to produce an explanation for item recommendation. However, explanations from the template-based methods may be uninformative and unpersuasive. Moreover, designing high-quality templates is time-consuming and usually requires domain knowledge.
The natural language generation-based methods can generate more natural and flexible sentences. However, some of these methods can only generate short
recommendations based on given attributes such as user identity, item identity and rating value. It is difficult for them to generate reliable and precise explanations due to lack of other guiding information or generative signals.
Therefore, in order to address or alleviate at least one of the aforementioned problems and/or disadvantages, there is a need to provide an improved explainable recommendation system and method.
Summary
Embodiments of the present disclosure relate to an explainable recommendation system and method, more specifically a computer system and a computerized method for recommending an item to a user. The method includes: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
An explainable recommendation system and method according to the present disclosure are thus disclosed herein. Various features and advantages of the present disclosure will become more apparent from the following detailed description of the
embodiments of the present disclosure, by way of non-limiting examples only, along with the accompanying drawings.
Brief Description of the Drawings
Figure 1 illustrates an explainable recommendation system according to embodiments of the present disclosure.
Figures 2A and 2B are illustrations of dependency tree structures of two reviews and a semantic graph in the explainable recommendation system.
Figures 3A and 3B illustrate graph pooling in the explainable recommendation system.
Figures 4A to 4D illustrate performance evaluations of the explainable recommendation system.
Detailed Description
For purposes of brevity and clarity, descriptions of embodiments of the present disclosure are directed to an explainable recommendation system and method, in accordance with the drawings. While parts of the present disclosure will be described in conjunction with the embodiments provided herein, it will be understood that they are not intended to limit the present disclosure to these embodiments. On the contrary, the present disclosure is intended to cover alternatives, modifications and equivalents to the embodiments described herein, which are included within the scope of the present disclosure as defined by the appended claims. Furthermore, in the following detailed description, specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be recognized by an individual having ordinary skill in the art, i.e. a skilled person, that the present disclosure may be practiced without specific details, and/or with multiple details arising from combinations of features of particular embodiments. In a number of instances, well-known systems, methods, procedures, and components have not been described
in detail so as to not unnecessarily obscure features of the embodiments of the present disclosure.
In embodiments of the present disclosure, depiction of a given element or consideration or use of a particular element number in a particular figure or a reference thereto in corresponding descriptive material can encompass the same, an equivalent, or an analogous element or element number identified in another figure or descriptive material associated therewith.
References to “an embodiment / example”, “another embodiment / example”, “some embodiments / examples”, “some other embodiments / examples”, and so on, indicate that the embodiment(s) / example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment / example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in an embodiment / example” or “in another embodiment / example” does not necessarily refer to the same embodiment / example.
The terms “comprising”, “including”, “having”, and the like do not exclude the presence of other features / elements / steps than those listed in an embodiment. Recitation of certain features / elements / steps in mutually different embodiments does not indicate that a combination of these features / elements / steps cannot be used in an embodiment.
As used herein, the terms “a” and “an” are defined as one or more than one. The use of 7” in a figure or associated text is understood to mean “and/or” unless otherwise indicated. The term “set” is defined as a non-empty finite organization of elements that mathematically exhibits a cardinality of at least one (e.g. a set as defined herein can correspond to a unit, singlet, or single-element set, or a multiple-element set), in accordance with known mathematical definitions. The recitation of a particular numerical value or value range herein is understood to include or be a recitation of an approximate numerical value or value range. The terms “first”, “second”, etc. are used
merely as labels or identifiers and are not intended to impose numerical requirements on their associated terms.
Representative or exemplary embodiments of the present disclosure describe an explainable recommendation system and method. With reference to Figure 1 , the explainable recommendation system is illustrated as a computer system 100 for recommending an item i to a user u and generating an explainable recommendation for recommending the item i to the user u. The item i may be a product such as a book or may be a service such as streaming content. The computer system 100 includes a database 110 storing historical reviews by the user u and historical reviews of the item i. The database 110 also stores historical reviews by other users and historical reviews of other items, as well as identifiers of the users and items. The historical user reviews are reviews associated with the user u, such as reviews of various items written by the user u in the past. The historical item reviews are reviews associated with the item i, such as reviews of the item i written by various users in the past. The historical user reviews are denoted by where
ndu denotes the number of historical reviews associated with the user u. The historical item reviews are denoted where nd. denotes the
number of historical reviews associated with the item i.
The computer system 100 includes a graph generation module 120 configured for generating a historical user semantic graph from the historical user reviews Vu and for generating a historical item semantic graph from the historical item reviews Ί) The semantic graphs can provide a better understanding of the user and item details based on the historical reviews and reduce the impacts caused by noise reviews. The historical user semantic graph is denoted by , where Xu denotes the
set of nodes (e.g. words), £u denotes the set of edges, and
Similarly, the historical item semantic graph is denoted by
where Xj denotes the set of nodes and £ denotes the set
of edges, and Si = {(xh,r, xt)\xh, xt E Xi, r E tR} . r denotes the relation connecting the two nodes, and (R denotes the set of all possible relations.
In some embodiments, the graph generation module 120 may perform some processing steps in generating the historical user semantic graph Qu and the historical item semantic graph Qi . For example, text pre-processing techniques, such as tokenization and spelling corrections, may be applied on each review. Dependency parsing is then used to automatically generate a constituent-based representation, such as a syntax dependency tree, for each review sentence based on syntax. The relations in the dependency tree may provide important clues to mine aspects, details, and opinions. Figure 2A illustrates the dependency tree structures of two exemplary reviews. In the first review, there is a sub-tree: story -> ( nmod:width ) -> twists -> ( amod ) -> interesting. In the second review, there is a sub-tree: story -> ( nmod.width ) -> characters -> (amod) -> memorable nmod.width and amod are some examples of relations used in dependency trees. The sub-trees are built up with the structure of aspect -> details -> opinion.
Pruning may also be performed to remove words with little semantic information. For example, the relation “def between “story’ and its determiner “a” has little semantic information and can be removed, keeping only the head and tail nodes. Similarly, relations such as “ nmod.poss” and “puncf can be removed. The dependency trees of the review sentences are connected by replacing the same words in different dependency trees with the same node, thereby aggregating different reviews and generating the historical user semantic graph Qu and the historical item semantic graph In an exemplary semantic graph as shown in Figure 2B, details about “love story”, such as “characters" and “twists", can be directly connected to “ love story” in the semantic graph.
The computer system 100 includes an aspect extraction module 122 configured for extracting a set of historical user aspects of the user u from the historical user reviews T>u and for extracting a set of historical item aspects of the item i from the historical item reviews Dj. Aspects usually represent features of items, such as price of a
product {“price"), category of a product, property / characteristic of a product. For example, an aspect can be a genre of a book, such as “romance" or “mystery".
Historical user aspects are extracted from the historical user reviews T)u and denoted by Lu, and historical item aspects are extracted from the historical item reviews Dj and denoted
The user aspects Lu are sorted according to their occurrence frequency in descending order, and the top n ranked user aspects are chosen, hence generating the set of historical user aspects as denoted by Qu = {q^, q^, Qiu
} Similarly, the item aspects
are sorted according to their occurrence frequency in descending order, and the top n ranked item aspects are chosen, , hence generating the set of historical item aspects as denoted by Qi — q\, qf, qf, ... , qf }.
In some embodiments, the computer system 100 includes a set of embedding layers 130 configured for embedding or encoding one or more of a user identifier of the user u, an item identifier of the item i, the set of historical user aspects Qu, the set of historical item aspects Qi , the historical user semantic graph Qu , and the historical item semantic graph
For example, the embedding layers 130 encodes nodes in the graphs using one-hot encoding.
The computer system 100 includes a graph representation learning module 140 configured for constructing a hierarchy of semantic graphs from the historical semantic graphs using graph pooling. As the historical semantic graphs are generated from historical reviews, the graph representation learning module 140 may be referred to as a review-based graph representation learning (RGRL) module. The graph representation learning module 140 includes an aspect-guided graph pooling (AGP) operator 150 for performing the graph pooling to extract aspect-specific knowledge from the historical semantic graphs based on the historical aspects.
As shown in Figure 3A, the inputs of the AGP operator 150 include an input graph Q = {X, 8, X, A}, and an aspect q with its representation f(q). X and 8 denote the set of nodes and edges in the graph Q, respectively. X is the node feature matrix and
A is the adjacency matrix. f(q ) is the representation of the aspect q obtained from the embedding layers 130. For example, a Bidirectional Long Short-Term Memory (BiLSTM) model is used to encode the word embedding of the aspect q and obtain the backward hidden state as the representation /(q). A Graph Attention Network (GAT) may be used to encode the graph Q for input into the AGP operator 150. For each node the first-hop neighbours in the graph Q are denoted by The
feature of xh is updated by aggregating the input features of neighbourhood nodes and adding its input feature xh by self-loop as follows.
js the weight matrix and a(xh,r, xt) denotes the attention score
between two nodes xh and xt. The attention score can be defined as follows.
can be implemented by the following attentional mechanism.
LeakyRELU is the Leaky Rectified Linear Unit (ReLU) activation function r is the embedding of the relation r, and are the weight matrices. A
matrix is used to denote the updated features of all nodes. An aspect-aware importance score
is defined as follows to describe the relevance of each node in the graph Q to the given aspect q.
In , I 1 I denotes the absolute value function. The nodes in the graph can be
ranked according to the importance scores in descending order. The set of top
K ranked nodes is denoted by and their indices by idx(g). K is empirically set as
l, where p is the pooling ratio, | | denotes the cardinality of a set, and . ]
denotes the ceiling function. The new features of nodes in
and adjacency matrix
of the corresponding graph are defined as follows.
and are the weight matrix and bias vector, respectively.
is the row-wise indexed feature matrix. aims to get
the row-wise and column-wise indexed adjacency matrix from A. Further, and
are the new feature matrix and the corresponding adjacency matrix, respectively, after pooling. £ is used to denote the set of edges that describes the connecting relationships between the nodes in
. The output graph of the AGP operator 150 is denoted by
The graph representation learning module 140 is configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of user semantic graphs from the historical user semantic graph Qu and the historical user aspects Qu. As shown in Figure 3B, L layers of graph pooling are conducted on Qu and guided by Qu. At the /-th layer, there are n AGP operators 150. The input graph of the k- th AGP operator 150 at the /-th layer is The representation
°f the fc-th aspect is used to guide the graph pooling in the fc-th AGP
operator 150. For example, a BiLSTM model in the embedding layers 130 is used to encode the k- th aspect and the backward hidden state is obtained as the
representation
The output graph of the k -th AGP operator 150 at the Z -th layer is
which will be used as the input graph of the Zc-th AGP
operator 150 at the (Z + l)-th layer. The input graph of the k- th AGP operator 150 at the (Z + l)-th layer is where . Notably,
at the first layer (Z = 1), the input graph for k = 1,2,3, ... , n.
A pooling operation such as maximum pooling is performed on to obtain the
aspect-aware graph representation at the Z-th layer. After performing the pooling
operation L times corresponding to the L layers, multiple representations of g that are relevant to the user aspect can be obtained, i.e. The
graph representations are concatenated to fuse them from fine-grained to coarse, thereby forming the graph representation of the historical user semantic graph g
u as follows.
The hierarchy of user semantic graphs is constructed by concatenating the graph representations with the representation of the respective k- th user aspect in
Qu, where k = 1,2,3, ... , n. is defined as follows.
The graph representation learning module 140 is also configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects Q
It will be appreciated that various parts of the description above for constructing the hierarchy of user semantic graphs from the historical user semantic graph Qu and the historical user aspects Qu apply equally to constructing the hierarchy of item semantic graphs and are omitted for purpose of brevity. Accordingly, the hierarchy of item semantic graphs Sj is defined as follows.
The computer system 100 includes an aspect matching module 160 configured for generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects. The set of matched aspects may include a user preference vector vu that matches every user aspect of the user u to one or more relevant item aspects, thus aiming to find user aspects that the user u is interested or prefers in the item i. The set of matched aspects may include an item preference vector Vj that matches every item aspect of the item i to one or more relevant user aspects, thus aiming to find item aspects that are highly relevant to the user u. The user preference vector vu and the item preference vector nέ are defined as follows.
denotes the mean pooling operation and are trainable
parameters Ms is the aspect level importance weight matrix and is defined as follows.
is a learnable weight matrix and . In
Each element Ms[x,y] describes the importance of the y-th item aspect to the x-th user aspect. The aspect level information of the user w and item i are fused with the aspect level importance weight matrix Ms to obtain the user preference vector vu and the item preference vector Vj described above.
The computer system 100 includes an aspect processing module configured for generating, from the matched aspects, aspect input data associated with the user’s interest in the item. For example, the aspect processing module is configured for calculating a rating score from the matched aspects and the aspect input data may include the rating score. The rating score represents the user u’s interest in or preference on the item i . The user identifier of the user u is embedded by the embedding layers 130 to form a user identifier representation eu. The user identifier representation eu is input into a Multi-Layer Perceptron (MLP) model 170 and the output is concatenated with the user preference vector vu to obtain the final representation xu of the user u. The final representation x£ of the item i can be obtained in a similar manner. The final representations xu and Xj are defined as follows.
are learnable parameters. The final representations xu and X
j are concatenated to form x. The aspect processing module may include a factorization machine model 180 configured for calculating the rating score fui. The rating score rui represents the user u’s preference on the item i and is defined as follows.
b0 , bu, and b^ are the global bias, user bias, and item bias, respectively
jS C0effjCjent vector. Zj and Zj are the i-th and j -th dimensions of
trainable parameters of the factorization machine model 180, respectively. denotes the dot product of two vectors. X and j are the i-th and j-th dimensions of x, respectively.
The computer system 100 includes a first machine learning model 200 configured for determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of the first machine learning model 200. The first machine learning model 200 may be referred as the aspect generation module.
As shown in Figure 3B, for each user aspect the node feature matrix of the
graph can be obtained from the graph pooling of the user semantic graphs.
Similarly, for each item aspect , the node feature matrix of the graph can
be obtained from the graph pooling of the item semantic graphs. All the aspect relevant node features matrices are stacked to form the following matrices Xu and Xt.
The first machine learning model 200 may include a first LSTM (Long Short-Term Memory) model, wherein the predicted aspects are determined at each time step of the first LSTM model. Determining the predicted aspects includes determining a current predicted aspect at a current time step. Determining the current predicted aspect includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step, and determining the current predicted aspect from the current hidden state.
The previous hidden state may be an initial hidden state at an initial time step of
the first LSTM model. The initial hidden state is calculated from the aspect input
data which includes the rating score . The rating score rui is mapped into a
sentiment representation vr to guide the aspect and explanation generation, defined as follows.
and are the trainable weight matrix and the bias vector,
respectively. The aspect input data including the sentiment representation vr, user preference vector vu, and item preference vector Vj are input into an MLP model to calculate the initial hidden state , defined as follows.
The hidden state at each time step j — 1, where j = 1,2,3, is incorporated
with the user node feature x to calculate the user attention vector , as
follows.
The item attention vector can similarly be obtained from The current hidden
state at the current time step j can be calculated as follows.
The user attention vector 's calculated from the previous hidden state and
the node features derived from the hierarchy of user semantic graphs. The item attention vector is calculated from the previous hidden state and the node
features Xj derived from the hierarchy of item semantic graphs. An attention vector at the previous time step is calculated by concatenating the user attention vector
and item attention vector . The attention vector is then input into LSTM model
as shown in the equations above for calculating the current hidden state
The first machine learning model 200 includes an MLP model wherein the current predicted aspect is determined from the current hidden state h“ using the MLP model.
More specifically, the current hidden state h“ is input into the MLP model to obtain the probability distribution of the current predicted aspect wj1 at the current time step j, as follows.
is a trainable weight parameter, da is the size of the vocabulary of
predicted aspects, and ba is the bias vector.
In one example of aspect prediction by the first machine learning model 200 as shown in Figure 1 , the first hidden state
at the start of sequence <SOS> is generated from the initial hidden state and the first predicted aspect at the first hidden state
is
story. The second hidden state is generated from the first hidden state and the
first predicted aspect {story). The second predicted aspect at the second hidden state is characters. The third hidden state
at the end of sequence <EOS> is generated from the second hidden state and the second predicted aspect
{characters).
The computer system 100 includes a second machine learning model 220 configured for determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of the second machine learning model 220, the predicted words for explaining the respective predicted aspect. For example, the predicted words form a sentence that explains the predicted aspect. The second machine learning model 220 may be referred as the explanation generation module.
The second machine learning model 220 may include a second LSTM model, wherein the predicted words for each predicted aspect are determined at each time step of the second LSTM model. For each predicted aspect, determining the predicted words includes determining a current predicted word at a current time step. Determining the current predicted word includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step, and determining the current predicted word from the current hidden state.
The previous hidden state may be an initial hidden state at an initial time step of
the second LSTM model. For the y-th explanation pj E P, the y'-th hidden state
from the first machine learning model 200 is used as the initial hidden state
Similar to the above for the first machine learning model 200, the user attention vector anc* 'tem attenti°n vector c are calculated from the previous hidden
state as well as the hierarchy of user semantic graphs and hierarchy of item
semantic graphs, respectively. More specifically, the hidden state at each time
step t — 1, where t — 1,2,3, ..., is incorporated with Xu and Xt to calculate the user attention vector and item attention vector respectively.
An attention vector at the previous time step is calculated by concatenating the user attention vector and item attention vector wherein the attention vector
is used for calculating the current hidden state h t . More specifically, the current hidden state at the current time step t can be calculated as follows.
is the embedding of the previous predicted word
The second machine learning model 220 includes an MLP model wherein the current predicted word is determined from the current hidden state using the MLP model.
More specifically, the current hidden state is input into the MLP model to obtain
the probability distribution of the current predicted word at the current time step
t, as follows.
is a trainable weight parameter, is the size of the vocabulary of
predicted words, and is the bias vector.
In one example of words or sentence prediction by the second machine learning model 220 as shown in Figure 1 , the predicted words for each predicted aspect are determined as follows. The second predicted aspect ( characters ) at the second hidden state h2 from the first machine learning model 220 is used in this example. The second hidden state h2 is set as the initial hidden state in the second machine learning model 220. The first hidden state h2 1 at the <SOS> is generated from the initial hidden state h2 and the first predicted word at the first hidden state h2 4 is the. The second hidden state
2 is generated from the first hidden state
1 and the first predicted word (the). The second predicted word at the second hidden state h2 2 is are. The third hidden state h2 3 is generated from the second hidden state h2 2 and the second predicted word (are). The third predicted word at the third hidden state h2 3 is wonderful. The fourth hidden state h2 4 at the end of sequence <EOS> is generated from the third hidden state h2 3 and the third predicted word (wonderful).
The computer system 100 includes a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user. In the example as shown in Figure 1 , the explainable recommendation for the predicted aspect characters is the sentence “the characters are wonderful' formed by the predicted words.
Embodiments of the present disclosure also describe a computerized method for recommending an item i to a user it. The computerized method may be implemented on the computer system 100 which includes the various components described above as well as one or more processors configured for performing various steps of the computerized method in response to non-transitory instructions operative or executed by the processors. The non-transitory instructions are stored on a memory of the computer system and may be referred to as computer-readable storage media and/or non-transitory computer-readable media. Non-transitory computer-readable media
include all computer-readable media, with the sole exception being a transitory propagating signal per se.
The computerized method includes steps of: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
The computer system 100 / computerized method described in various embodiments herein provides an explainable recommendation system / method that employs review-based semantic graphs to include more user and item details for generating informative explanations covering multiple aspects. Moreover, the recommendation system extracts aspect-relevant knowledge from the semantic graphs to learn the user’s preferences on different item aspects. The recommendation system, which may be referred to as the Hierarchical Aspect-guided Review generation (HARE) recommendation system, can thus generate high quality and informative explanations to explain the recommendation of items to the user. The explainable recommendation system not only recommends new items but also generates intuitive explanations which can help users to make better and quicker decisions. For example, users may be more persuaded by the explanation to purchase recommended items.
The explainable recommendation system was tested using three real-world datasets to evaluate the explanation generation performance and the preference prediction performance. The first and second datasets were derived from Amazon® Review Data containing product reviews from Amazon®. The first subset was derived from the Kindle Store subset (“Kindle”), and the second dataset was derived from the Electronics subset (“Electronics”). The third dataset was derived from the Yelp® Challenge 2019 dataset (“Yelp”). In each dataset, a record includes a user identifier, an item identifier, an overall rating, and a textual review. Aspects were extracted from the reviews and records with only one aspect were excluded. Aspect-relevant sentences were extracted from the reviews as explanations. Figure 4A shows the statistics of the Kindle, Electronics, and Yelp datasets. Each dataset was randomly split the data by the ratio 8:1 :1 as training, validation, and test data for the recommendation system.
The HARE recommendation system was compared against other explanation generation methods - Att2Seq, ExpNet, Ref2Seq, NETE-PMI, and ACF - to evaluate the explanation generation performance. Att2Seq incorporates the Seq2Seq model [Sutskever2014Sequence] and attention mechanism to learn the user’s preference from the user attributes and generate review explanations. ExpNet utilizes an encoder- decoder framework to expand a short phrase to a long review by combining the user and item information with other auxiliary information. Ref2Seq follows the structure of Seq2Seq and learns the representation from the user and item reviews to generate explanations. NETE-PMI adopts MLP to predict the rating and then generates a template-controlled sentence with a single predicted aspect. ACF uses MLP to encode different attributes and applies a coarse-to-fine decoding model to generate long reviews.
The following evaluation metrics were used to evaluate the explanation generation performance of the HARE recommendation system and the other methods - BLEU (bilingual evaluation understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit Ordering). These metrics evaluate the text similarity between the generated and gold explanations. BLEU (specifically BLEU-1 and BLEU-4) evaluates the n-gram overlap
between gold and generated explanations. ROUGE (specifically ROUGE-1 and ROUGE-L) evaluates the recall, precision, and accuracy of the n-gram overlap. METEOR calculates the harmonic mean of each word precision and recall based on the whole corpus. Additionally, a Feature Matching Ratio (FMR) is used to measure whether the generated explanation can include the predicted aspects.
Figures 4B and 4C show the performance of the single-aspect and multi-aspect explanation generation tasks achieved by the different methods. Larger BLEU, ROUGE, METEOR, and FMR values indicate better results for the explanation generation performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best explanation generation performance on most metrics compared to the other methods.
The HARE recommendation system was compared against other prediction methods - PMF, SVD++, CARL, RMG, and NETE-PMI - to evaluate the ability in predicting users’ preferences based on the rating scores. PMF is the probabilistic matrix factorization method developed for rating prediction. SVD++ exploits both the user’s preferences on items and the influences between items for recommendation. CARL uses CNNs to learn relevant aspects from the review data. RMG uses a multi-view learning framework to incorporate the review contents and the users’ rating behaviours for the recommendation. NETE-PMI feeds user and item identifiers to an MLP to predict rating scores.
The Mean Absolute Error (MAE) is used as the evaluation metric to evaluate the preference prediction performance of the HARE recommendation system and the other methods. The definition of MAE is as follows.
T denotes the set of test data, rui denotes the predicted rating score, rui denotes the rating score in the test data, and | | denotes the cardinality of a set.
Figure 4D shows the performance of the preference prediction tasks achieved by the different methods. Lower MAE values indicate better results for the preference prediction performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best preference prediction performance for the Kindle and Yelp datasets, and the second- best prediction performance for the Electronics dataset. These results indicate that the user preference predicted by HARE can support the high-quality explanation generation. In the foregoing detailed description, embodiments of the present disclosure in relation to an explainable recommendation system and method are described with reference to the provided figures. The description of the various embodiments herein is not intended to call out or be limited only to specific or particular representations of the present disclosure, but merely to illustrate non-limiting examples of the present disclosure. The present disclosure serves to address at least one of the mentioned problems and issues associated with the prior art. Although only some embodiments of the present disclosure are disclosed herein, it will be apparent to a person having ordinary skill in the art in view of this disclosure that a variety of changes and/or modifications can be made to the disclosed embodiments without departing from the scope of the present disclosure. Therefore, the scope of the disclosure as well as the scope of the following claims is not limited to embodiments described herein.
Claims
1. A computerized method for recommending an item to a user, the method comprising: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
2. The method according to claim 1 , wherein the first machine learning model comprises a first LSTM model, and wherein the predicted aspects are determined at each time step of the first LSTM model.
3. The method according to claim 2, wherein determining the predicted aspects comprises determining a current predicted aspect at a current time step, comprising:
calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step; and determining the current predicted aspect from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the first LSTM model, the initial hidden state calculated from the aspect input data.
4. The method according to claim 3, wherein determining the current predicted aspect comprises calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.
5. The method according to claim 3 or 4, wherein the first machine learning model comprises a MLP model, and wherein the current predicted aspect is determined from the current hidden state using the MLP model.
6. The method according to any one of claims 2 to 5, wherein the second machine learning model comprises a second LSTM model, and wherein the predicted words for each predicted aspect are determined at each time step of the second LSTM model.
7. The method according to claim 6, wherein for each predicted aspect, determining the predicted words comprises determining a current predicted word at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step; and determining the current predicted word from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the second LSTM model, the initial hidden state derived from a current hidden state of the first LSTM model.
8. The method according to claim 7, wherein determining the current predicted word comprises calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.
9. The method according to claim 7 or 8, wherein the second machine learning model comprises a MLP model, and wherein the current predicted word is determined from the current hidden state using the MLP model.
10. The method according to any one of claims 1 to 9, further comprising calculating, from the matched aspects, a rating score representing the user’s interest in the item, wherein the aspect input data comprises the rating score, and optionally wherein the rating score is calculated using a factorization machine model.
11. The method according to any one of claims 1 to 10, wherein: a representation of the user and a representation of the item are embedded using an MLP model; and/or the historical user and item aspects are embedded using a bidirectional LSTM model.
12. A non-transitory computer-readable storage medium storing computer- readable instructions that, when executed by at least one processor of a computer system, cause the computer system to perform the computerized method according to any one of claims 1 to 11.
13. A computer system for recommending an item to a user, the system comprising: a database storing historical reviews by the user and historical reviews of the item;
a graph generation module configured for: generating a historical user semantic graph from the historical user reviews; generating a historical item semantic graph from the historical item reviews; an aspect extraction module configured for: extracting a set of historical user aspects of the user from the historical user reviews; and extracting a set of historical item aspects of the item from the historical item reviews; a graph representation learning module configured for: constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; and constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; an aspect matching module configured for: generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; an aspect processing module configured for: generating, from the matched aspects, aspect input data associated with the user’s interest in the item; a first machine learning model configured for determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of the first machine learning model; a second machine learning model configured for determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of the second machine learning model, the predicted words for explaining the respective predicted aspect; and a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.
14. The system according to claim 13, wherein the first machine learning model comprises a first LSTM model configured for determining the predicted aspects at each time step of the first LSTM model.
15. The system according to claim 14, wherein the first LSTM model is configured for determining a current predicted aspect at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step; and determining the current predicted aspect from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the first LSTM model, the initial hidden state calculated from the aspect input data.
16. The system according to claim 15, wherein the first LSTM model is configured for calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.
17. The system according to any one of claims 14 to 16, wherein the second machine learning model comprises a second LSTM model, and wherein the second machine learning model is configured for determining, for each predicted aspect, the predicted words at each time step of the second LSTM model.
18. The system according to claim 17, wherein the second LSTM model is configured for determining, for each predicted aspect, a current predicted word at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step; and determining the current predicted word from the current hidden state,
optionally wherein the previous hidden state is an initial hidden state at an initial time step of the second LSTM model, the initial hidden state derived from a current hidden state of the first LSTM model.
19. The system according to claim 18, wherein the second LSTM model is configured for calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.
20. The system according to any one of claims 13 to 19, wherein the aspect processing module is configured for calculating, from the matched aspects, a rating score representing the user’s interest in the item, and optionally wherein the aspect processing module comprises a factorization machine model for calculating the rating score.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10202104385T | 2021-04-28 | ||
SG10202104385T | 2021-04-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022231522A1 true WO2022231522A1 (en) | 2022-11-03 |
Family
ID=83848886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2022/050256 WO2022231522A1 (en) | 2021-04-28 | 2022-04-28 | Explainable recommendation system and method |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022231522A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116385070A (en) * | 2023-01-18 | 2023-07-04 | 中国科学技术大学 | Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190392330A1 (en) * | 2018-06-21 | 2019-12-26 | Samsung Electronics Co., Ltd. | System and method for generating aspect-enhanced explainable description-based recommendations |
CN111966888A (en) * | 2019-05-20 | 2020-11-20 | 南京大学 | External data fused interpretable recommendation method and system based on aspect categories |
CN112417306A (en) * | 2020-12-10 | 2021-02-26 | 北京工业大学 | Method for optimizing performance of recommendation algorithm based on knowledge graph |
US20210065278A1 (en) * | 2019-08-27 | 2021-03-04 | Nec Laboratories America, Inc. | Asymmetrically hierarchical networks with attentive interactions for interpretable review-based recommendation |
-
2022
- 2022-04-28 WO PCT/SG2022/050256 patent/WO2022231522A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190392330A1 (en) * | 2018-06-21 | 2019-12-26 | Samsung Electronics Co., Ltd. | System and method for generating aspect-enhanced explainable description-based recommendations |
CN111966888A (en) * | 2019-05-20 | 2020-11-20 | 南京大学 | External data fused interpretable recommendation method and system based on aspect categories |
US20210065278A1 (en) * | 2019-08-27 | 2021-03-04 | Nec Laboratories America, Inc. | Asymmetrically hierarchical networks with attentive interactions for interpretable review-based recommendation |
CN112417306A (en) * | 2020-12-10 | 2021-02-26 | 北京工业大学 | Method for optimizing performance of recommendation algorithm based on knowledge graph |
Non-Patent Citations (2)
Title |
---|
BAI PENG; XIA YANG; XIA YONGSHENG: "Fusing Knowledge and Aspect Sentiment for Explainable Recommendation", IEEE ACCESS, IEEE, USA, vol. 8, 25 July 2020 (2020-07-25), USA , pages 137150 - 137160, XP011802819, DOI: 10.1109/ACCESS.2020.3012347 * |
SUSEN YANG; YONG LIU; YINAN ZHANG; CHUNYAN MIAO; ZAIQING NIE; JUYONG ZHANG: "Learning Hierarchical Review Graph Representations for Recommendation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 5 August 2020 (2020-08-05), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081733092 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116385070A (en) * | 2023-01-18 | 2023-07-04 | 中国科学技术大学 | Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce |
CN116385070B (en) * | 2023-01-18 | 2023-10-03 | 中国科学技术大学 | Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11403680B2 (en) | Method, apparatus for evaluating review, device and storage medium | |
US11599714B2 (en) | Methods and systems for modeling complex taxonomies with natural language understanding | |
Shankar et al. | An overview and empirical comparison of natural language processing (NLP) models and an introduction to and empirical application of autoencoder models in marketing | |
CN103309886B (en) | Trading-platform-based structural information searching method and device | |
CN110059163B (en) | Method and device for generating template, electronic equipment and computer readable medium | |
JP7155758B2 (en) | Information processing device, information processing method and program | |
CN111767737A (en) | Text intention similarity determining method and device, electronic equipment and storage medium | |
KR101842361B1 (en) | An apparatus for analyzing sentiment of review data and method thereof | |
CN108875065B (en) | Indonesia news webpage recommendation method based on content | |
CN111753082A (en) | Text classification method and device based on comment data, equipment and medium | |
CN111563384A (en) | Evaluation object identification method and device for E-commerce products and storage medium | |
WO2020161505A1 (en) | Improved method and system for text based searching | |
KR20200139008A (en) | User intention-analysis based contract recommendation and autocomplete service using deep learning | |
CN107122404A (en) | A kind of user view data extracting method and device | |
US20120239382A1 (en) | Recommendation method and recommender computer system using dynamic language model | |
CN115248839A (en) | Knowledge system-based long text retrieval method and device | |
WO2022231522A1 (en) | Explainable recommendation system and method | |
CN114328899A (en) | Text summary generation method, device, equipment and storage medium | |
CN113988057A (en) | Title generation method, device, equipment and medium based on concept extraction | |
Islek et al. | A hybrid recommendation system based on bidirectional encoder representations | |
CN115455152A (en) | Writing material recommendation method and device, electronic equipment and storage medium | |
CN115203206A (en) | Data content searching method and device, computer equipment and readable storage medium | |
CN114254622A (en) | Intention identification method and device | |
CN115880108A (en) | Method and device for generating bid document and computer readable storage medium | |
CN113157892A (en) | User intention processing method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22796284 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22796284 Country of ref document: EP Kind code of ref document: A1 |