WO2022231522A1

WO2022231522A1 - Explainable recommendation system and method

Info

Publication number: WO2022231522A1
Application number: PCT/SG2022/050256
Authority: WO
Inventors: Yidan Hu; Chunyan MIAO; Yong Liu
Original assignee: Nanyang Technological University
Priority date: 2021-04-28
Filing date: 2022-04-28
Publication date: 2022-11-03

Abstract

The present disclosure generally relates to an explainable recommendation system and method. The method includes: generating, from historical reviews by the user and of the item, historical user and item semantic graphs and historical user and item; constructing, using graph pooling, hierarchies of user and item semantic graphs from the historical user and item semantic graphs and the historical user and item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user's interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, predicted aspects using a first machine learning model; determining, from the hierarchies of user and item semantic graphs and the predicted aspects, predicted words using a second machine learning model; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.

Description

EXPLAINABLE RECOMMENDATION SYSTEM AND METHOD

Cross Reference to Related Application(s)

The present disclosure claims the benefit of Singapore Patent Application No. 10202104385T filed on 28 April 2021 , which is incorporated in its entirety by reference herein.

Technical Field

The present disclosure generally relates to an explainable recommendation system and method. More particularly, the present disclosure describes various embodiments of a computer system and a computerized method for recommending an item to a user.

Background

Recommendation systems have been widely used to help users make decisions by suggesting to them items or products that they may be interested. Although some existing recommendation methods usually achieve satisfactory performances, it is still difficult to explain their recommendations. Current methods for explainable recommendation can be roughly classified into two groups: template-based and natural language generation-based.

The template-based methods generate explanations by filling the generated words in a predefined sentence template with different words for different users. For example, in the template “You might be interested in [aspect], on which this product performs welt’, the [ aspect ] can be replaced by a generated aspect to produce an explanation for item recommendation. However, explanations from the template-based methods may be uninformative and unpersuasive. Moreover, designing high-quality templates is time-consuming and usually requires domain knowledge.

The natural language generation-based methods can generate more natural and flexible sentences. However, some of these methods can only generate short recommendations based on given attributes such as user identity, item identity and rating value. It is difficult for them to generate reliable and precise explanations due to lack of other guiding information or generative signals.

Therefore, in order to address or alleviate at least one of the aforementioned problems and/or disadvantages, there is a need to provide an improved explainable recommendation system and method.

Summary

Embodiments of the present disclosure relate to an explainable recommendation system and method, more specifically a computer system and a computerized method for recommending an item to a user. The method includes: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.

An explainable recommendation system and method according to the present disclosure are thus disclosed herein. Various features and advantages of the present disclosure will become more apparent from the following detailed description of the embodiments of the present disclosure, by way of non-limiting examples only, along with the accompanying drawings.

Brief Description of the Drawings

Figure 1 illustrates an explainable recommendation system according to embodiments of the present disclosure.

Figures 2A and 2B are illustrations of dependency tree structures of two reviews and a semantic graph in the explainable recommendation system.

Figures 3A and 3B illustrate graph pooling in the explainable recommendation system.

Figures 4A to 4D illustrate performance evaluations of the explainable recommendation system.

Detailed Description

For purposes of brevity and clarity, descriptions of embodiments of the present disclosure are directed to an explainable recommendation system and method, in accordance with the drawings. While parts of the present disclosure will be described in conjunction with the embodiments provided herein, it will be understood that they are not intended to limit the present disclosure to these embodiments. On the contrary, the present disclosure is intended to cover alternatives, modifications and equivalents to the embodiments described herein, which are included within the scope of the present disclosure as defined by the appended claims. Furthermore, in the following detailed description, specific details are set forth in order to provide a thorough understanding of the present disclosure. However, it will be recognized by an individual having ordinary skill in the art, i.e. a skilled person, that the present disclosure may be practiced without specific details, and/or with multiple details arising from combinations of features of particular embodiments. In a number of instances, well-known systems, methods, procedures, and components have not been described in detail so as to not unnecessarily obscure features of the embodiments of the present disclosure.

In embodiments of the present disclosure, depiction of a given element or consideration or use of a particular element number in a particular figure or a reference thereto in corresponding descriptive material can encompass the same, an equivalent, or an analogous element or element number identified in another figure or descriptive material associated therewith.

References to “an embodiment / example”, “another embodiment / example”, “some embodiments / examples”, “some other embodiments / examples”, and so on, indicate that the embodiment(s) / example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment / example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in an embodiment / example” or “in another embodiment / example” does not necessarily refer to the same embodiment / example.

The terms “comprising”, “including”, “having”, and the like do not exclude the presence of other features / elements / steps than those listed in an embodiment. Recitation of certain features / elements / steps in mutually different embodiments does not indicate that a combination of these features / elements / steps cannot be used in an embodiment.

As used herein, the terms “a” and “an” are defined as one or more than one. The use of 7” in a figure or associated text is understood to mean “and/or” unless otherwise indicated. The term “set” is defined as a non-empty finite organization of elements that mathematically exhibits a cardinality of at least one (e.g. a set as defined herein can correspond to a unit, singlet, or single-element set, or a multiple-element set), in accordance with known mathematical definitions. The recitation of a particular numerical value or value range herein is understood to include or be a recitation of an approximate numerical value or value range. The terms “first”, “second”, etc. are used merely as labels or identifiers and are not intended to impose numerical requirements on their associated terms.

Representative or exemplary embodiments of the present disclosure describe an explainable recommendation system and method. With reference to Figure 1 , the explainable recommendation system is illustrated as a computer system 100 for recommending an item i to a user u and generating an explainable recommendation for recommending the item i to the user u. The item i may be a product such as a book or may be a service such as streaming content. The computer system 100 includes a database 110 storing historical reviews by the user u and historical reviews of the item i. The database 110 also stores historical reviews by other users and historical reviews of other items, as well as identifiers of the users and items. The historical user reviews are reviews associated with the user u, such as reviews of various items written by the user u in the past. The historical item reviews are reviews associated with the item i, such as reviews of the item i written by various users in the past. The historical user reviews are denoted by where

n_du denotes the number of historical reviews associated with the user u. The historical item reviews are denoted where n_d. denotes the

number of historical reviews associated with the item i.

The computer system 100 includes a graph generation module 120 configured for generating a historical user semantic graph from the historical user reviews V_u and for generating a historical item semantic graph from the historical item reviews Ί) The semantic graphs can provide a better understanding of the user and item details based on the historical reviews and reduce the impacts caused by noise reviews. The historical user semantic graph is denoted by , where X_u denotes the

set of nodes (e.g. words), £_u denotes the set of edges, and

Similarly, the historical item semantic graph is denoted by

where X_j denotes the set of nodes and £ denotes the set of edges, and Si = {(x_h,r, x_t)\x_h, x_t E Xi, r E tR} . r denotes the relation connecting the two nodes, and (R denotes the set of all possible relations.

In some embodiments, the graph generation module 120 may perform some processing steps in generating the historical user semantic graph Q_u and the historical item semantic graph Qi . For example, text pre-processing techniques, such as tokenization and spelling corrections, may be applied on each review. Dependency parsing is then used to automatically generate a constituent-based representation, such as a syntax dependency tree, for each review sentence based on syntax. The relations in the dependency tree may provide important clues to mine aspects, details, and opinions. Figure 2A illustrates the dependency tree structures of two exemplary reviews. In the first review, there is a sub-tree: story -> ( nmod:width ) -> twists -> ( amod ) -> interesting. In the second review, there is a sub-tree: story -> ( nmod.width ) -> characters -> (amod) -> memorable nmod.width and amod are some examples of relations used in dependency trees. The sub-trees are built up with the structure of aspect -> details -> opinion.

Pruning may also be performed to remove words with little semantic information. For example, the relation “def between “story’ and its determiner “a” has little semantic information and can be removed, keeping only the head and tail nodes. Similarly, relations such as “ nmod.poss” and “puncf can be removed. The dependency trees of the review sentences are connected by replacing the same words in different dependency trees with the same node, thereby aggregating different reviews and generating the historical user semantic graph Q_u and the historical item semantic graph In an exemplary semantic graph as shown in Figure 2B, details about “love story”, such as “characters" and “twists", can be directly connected to “ love story” in the semantic graph.

The computer system 100 includes an aspect extraction module 122 configured for extracting a set of historical user aspects of the user u from the historical user reviews T>_u and for extracting a set of historical item aspects of the item i from the historical item reviews D_j. Aspects usually represent features of items, such as price of a product {“price"), category of a product, property / characteristic of a product. For example, an aspect can be a genre of a book, such as “romance" or “mystery".

Historical user aspects are extracted from the historical user reviews T)_u and denoted by L_u, and historical item aspects are extracted from the historical item reviews D_j and denoted

The user aspects L_u are sorted according to their occurrence frequency in descending order, and the top n ranked user aspects are chosen, hence generating the set of historical user aspects as denoted by Q_u = {q^, q^, Qiu

} Similarly, the item aspects

are sorted according to their occurrence frequency in descending order, and the top n ranked item aspects are chosen, , hence generating the set of historical item aspects as denoted by Qi — q\, qf, qf, ... , qf }.

In some embodiments, the computer system 100 includes a set of embedding layers 130 configured for embedding or encoding one or more of a user identifier of the user u, an item identifier of the item i, the set of historical user aspects Q_u, the set of historical item aspects Qi , the historical user semantic graph Q_u , and the historical item semantic graph

For example, the embedding layers 130 encodes nodes in the graphs using one-hot encoding.

The computer system 100 includes a graph representation learning module 140 configured for constructing a hierarchy of semantic graphs from the historical semantic graphs using graph pooling. As the historical semantic graphs are generated from historical reviews, the graph representation learning module 140 may be referred to as a review-based graph representation learning (RGRL) module. The graph representation learning module 140 includes an aspect-guided graph pooling (AGP) operator 150 for performing the graph pooling to extract aspect-specific knowledge from the historical semantic graphs based on the historical aspects.

As shown in Figure 3A, the inputs of the AGP operator 150 include an input graph Q = {X, 8, X, A}, and an aspect q with its representation f(q). X and 8 denote the set of nodes and edges in the graph Q, respectively. X is the node feature matrix and A is the adjacency matrix. f(q ) is the representation of the aspect q obtained from the embedding layers 130. For example, a Bidirectional Long Short-Term Memory (BiLSTM) model is used to encode the word embedding of the aspect q and obtain the backward hidden state as the representation /(q). A Graph Attention Network (GAT) may be used to encode the graph Q for input into the AGP operator 150. For each node the first-hop neighbours in the graph Q are denoted by The

feature of x_h is updated by aggregating the input features of neighbourhood nodes and adding its input feature x_h by self-loop as follows.

j_s th_e weight matrix and a(x_h,r, x_t) denotes the attention score

between two nodes x_h and x_t. The attention score can be defined as follows.

can be implemented by the following attentional mechanism.

LeakyRELU is the Leaky Rectified Linear Unit (ReLU) activation function r is the embedding of the relation r, and are the weight matrices. A

matrix is used to denote the updated features of all nodes. An aspect-aware importance score

is defined as follows to describe the relevance of each node in the graph Q to the given aspect q.

In , I ¹ I denotes the absolute value function. The nodes in the graph can be

ranked according to the importance scores in descending order. The set of top

K ranked nodes is denoted by and their indices by idx(g). K is empirically set as

l, where p is the pooling ratio, | | denotes the cardinality of a set, and . ]

denotes the ceiling function. The new features of nodes in

and adjacency matrix

of the corresponding graph are defined as follows.

and are the weight matrix and bias vector, respectively.

is the row-wise indexed feature matrix. aims to get

the row-wise and column-wise indexed adjacency matrix from A. Further, and

are the new feature matrix and the corresponding adjacency matrix, respectively, after pooling. £ is used to denote the set of edges that describes the connecting relationships between the nodes in

. The output graph of the AGP operator 150 is denoted by

The graph representation learning module 140 is configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of user semantic graphs from the historical user semantic graph Q_u and the historical user aspects Q_u. As shown in Figure 3B, L layers of graph pooling are conducted on Q_u and guided by Q_u. At the /-th layer, there are n AGP operators 150. The input graph of the k- th AGP operator 150 at the /-th layer is The representation

°f the fc-th aspect is used to guide the graph pooling in the fc-th AGP

operator 150. For example, a BiLSTM model in the embedding layers 130 is used to encode the k- th aspect and the backward hidden state is obtained as the

representation

The output graph of the k -th AGP operator 150 at the Z -th layer is

which will be used as the input graph of the Zc-th AGP

operator 150 at the (Z + l)-th layer. The input graph of the k- th AGP operator 150 at the (Z + l)-th layer is where . Notably,

at the first layer (Z = 1), the input graph for k = 1,2,3, ... , n.

A pooling operation such as maximum pooling is performed on to obtain the

aspect-aware graph representation at the Z-th layer. After performing the pooling

operation L times corresponding to the L layers, multiple representations of g that are relevant to the user aspect can be obtained, i.e. The

graph representations are concatenated to fuse them from fine-grained to coarse, thereby forming the graph representation of the historical user semantic graph g

_u as follows.

The hierarchy of user semantic graphs is constructed by concatenating the graph representations with the representation of the respective k- th user aspect in

Q_u, where k = 1,2,3, ... , n. is defined as follows.

The graph representation learning module 140 is also configured for constructing, using graph pooling by stacking the AGP operators 150, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects Q

It will be appreciated that various parts of the description above for constructing the hierarchy of user semantic graphs from the historical user semantic graph Q_u and the historical user aspects Q_u apply equally to constructing the hierarchy of item semantic graphs and are omitted for purpose of brevity. Accordingly, the hierarchy of item semantic graphs S_j is defined as follows.

The computer system 100 includes an aspect matching module 160 configured for generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects. The set of matched aspects may include a user preference vector v_u that matches every user aspect of the user u to one or more relevant item aspects, thus aiming to find user aspects that the user u is interested or prefers in the item i. The set of matched aspects may include an item preference vector V_j that matches every item aspect of the item i to one or more relevant user aspects, thus aiming to find item aspects that are highly relevant to the user u. The user preference vector v_u and the item preference vector n_έ are defined as follows.

denotes the mean pooling operation and are trainable

parameters M_s is the aspect level importance weight matrix and is defined as follows.

is a learnable weight matrix and _. In

Each element M_s[x,y] describes the importance of the y-th item aspect to the x-th user aspect. The aspect level information of the user w and item i are fused with the aspect level importance weight matrix M_s to obtain the user preference vector v_u and the item preference vector V_j described above.

The computer system 100 includes an aspect processing module configured for generating, from the matched aspects, aspect input data associated with the user’s interest in the item. For example, the aspect processing module is configured for calculating a rating score from the matched aspects and the aspect input data may include the rating score. The rating score represents the user u’s interest in or preference on the item i . The user identifier of the user u is embedded by the embedding layers 130 to form a user identifier representation e_u. The user identifier representation e_u is input into a Multi-Layer Perceptron (MLP) model 170 and the output is concatenated with the user preference vector v_u to obtain the final representation x_u of the user u. The final representation x_£ of the item i can be obtained in a similar manner. The final representations x_u and X_j are defined as follows.

are learnable parameters. The final representations x_u and X

_j are concatenated to form x. The aspect processing module may include a factorization machine model 180 configured for calculating the rating score f_ui. The rating score r_ui represents the user u’s preference on the item i and is defined as follows.

b₀ , b_u, and b^ are the global bias, user bias, and item bias, respectively

j_{S C0e}ffj_Cjent vector. Zj and Zj are the i-th and j -th dimensions of

trainable parameters of the factorization machine model 180, respectively. denotes the dot product of two vectors. X and j are the i-th and j-th dimensions of x, respectively.

The computer system 100 includes a first machine learning model 200 configured for determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of the first machine learning model 200. The first machine learning model 200 may be referred as the aspect generation module.

As shown in Figure 3B, for each user aspect the node feature matrix of the

graph can be obtained from the graph pooling of the user semantic graphs.

Similarly, for each item aspect , the node feature matrix of the graph can

be obtained from the graph pooling of the item semantic graphs. All the aspect relevant node features matrices are stacked to form the following matrices X_u and X_t.

The first machine learning model 200 may include a first LSTM (Long Short-Term Memory) model, wherein the predicted aspects are determined at each time step of the first LSTM model. Determining the predicted aspects includes determining a current predicted aspect at a current time step. Determining the current predicted aspect includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step, and determining the current predicted aspect from the current hidden state.

The previous hidden state may be an initial hidden state at an initial time step of

the first LSTM model. The initial hidden state is calculated from the aspect input

data which includes the rating score . The rating score r_ui is mapped into a

sentiment representation v_r to guide the aspect and explanation generation, defined as follows.

and are the trainable weight matrix and the bias vector,

respectively. The aspect input data including the sentiment representation v_r, user preference vector v_u, and item preference vector V_j are input into an MLP model to calculate the initial hidden state , defined as follows.

The hidden state at each time step j — 1, where j = 1,2,3, is incorporated

with the user node feature x to calculate the user attention vector , as

follows.

The item attention vector can similarly be obtained from The current hidden

state at the current time step j can be calculated as follows.

'^s the embedding of the previous predicted aspect .

The user attention vector '^s calculated from the previous hidden state and

the node features derived from the hierarchy of user semantic graphs. The item attention vector is calculated from the previous hidden state and the node

features Xj derived from the hierarchy of item semantic graphs. An attention vector at the previous time step is calculated by concatenating the user attention vector

and item attention vector . The attention vector is then input into LSTM model

as shown in the equations above for calculating the current hidden state

The first machine learning model 200 includes an MLP model wherein the current predicted aspect is determined from the current hidden state h“ using the MLP model.

More specifically, the current hidden state h“ is input into the MLP model to obtain the probability distribution of the current predicted aspect wj¹ at the current time step j, as follows.

is a trainable weight parameter, d_a is the size of the vocabulary of

predicted aspects, and b_a is the bias vector.

In one example of aspect prediction by the first machine learning model 200 as shown in Figure 1 , the first hidden state

at the start of sequence <SOS> is generated from the initial hidden state and the first predicted aspect at the first hidden state

is

story. The second hidden state is generated from the first hidden state and the

first predicted aspect {story). The second predicted aspect at the second hidden state is characters. The third hidden state

at the end of sequence <EOS> is generated from the second hidden state and the second predicted aspect

{characters).

The computer system 100 includes a second machine learning model 220 configured for determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of the second machine learning model 220, the predicted words for explaining the respective predicted aspect. For example, the predicted words form a sentence that explains the predicted aspect. The second machine learning model 220 may be referred as the explanation generation module.

The second machine learning model 220 may include a second LSTM model, wherein the predicted words for each predicted aspect are determined at each time step of the second LSTM model. For each predicted aspect, determining the predicted words includes determining a current predicted word at a current time step. Determining the current predicted word includes calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step, and determining the current predicted word from the current hidden state.

the second LSTM model. For the y-th explanation pj E P, the y^'-th hidden state

from the first machine learning model 200 is used as the initial hidden state

Similar to the above for the first machine learning model 200, the user attention vector a^nc* '^{tem attent}i°ⁿ vector c are calculated from the previous hidden

state as well as the hierarchy of user semantic graphs and hierarchy of item

semantic graphs, respectively. More specifically, the hidden state at each time

step t — 1, where t — 1,2,3, ..., is incorporated with X_u and X_t to calculate the user attention vector and item attention vector respectively.

An attention vector at the previous time step is calculated by concatenating the user attention vector and item attention vector wherein the attention vector

is used for calculating the current hidden state h _{t .} More specifically, the current hidden state at the current time step t can be calculated as follows.

is the embedding of the previous predicted word

The second machine learning model 220 includes an MLP model wherein the current predicted word is determined from the current hidden state using the MLP model.

More specifically, the current hidden state is input into the MLP model to obtain

the probability distribution of the current predicted word at the current time step

t, as follows.

is a trainable weight parameter, is the size of the vocabulary of

predicted words, and is the bias vector.

In one example of words or sentence prediction by the second machine learning model 220 as shown in Figure 1 , the predicted words for each predicted aspect are determined as follows. The second predicted aspect ( characters ) at the second hidden state h₂ from the first machine learning model 220 is used in this example. The second hidden state h₂ is set as the initial hidden state in the second machine learning model 220. The first hidden state h_{2 1} at the <SOS> is generated from the initial hidden state h₂ and the first predicted word at the first hidden state h_{2 4} is the. The second hidden state

₂ is generated from the first hidden state

₁ and the first predicted word (the). The second predicted word at the second hidden state h_{2 2} is are. The third hidden state h_{2 3} is generated from the second hidden state h_{2 2} and the second predicted word (are). The third predicted word at the third hidden state h_{2 3} is wonderful. The fourth hidden state h_{2 4} at the end of sequence <EOS> is generated from the third hidden state h_{2 3} and the third predicted word (wonderful).

The computer system 100 includes a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user. In the example as shown in Figure 1 , the explainable recommendation for the predicted aspect characters is the sentence “the characters are wonderful' formed by the predicted words.

Embodiments of the present disclosure also describe a computerized method for recommending an item i to a user it. The computerized method may be implemented on the computer system 100 which includes the various components described above as well as one or more processors configured for performing various steps of the computerized method in response to non-transitory instructions operative or executed by the processors. The non-transitory instructions are stored on a memory of the computer system and may be referred to as computer-readable storage media and/or non-transitory computer-readable media. Non-transitory computer-readable media include all computer-readable media, with the sole exception being a transitory propagating signal per se.

The computerized method includes steps of: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.

The computer system 100 / computerized method described in various embodiments herein provides an explainable recommendation system / method that employs review-based semantic graphs to include more user and item details for generating informative explanations covering multiple aspects. Moreover, the recommendation system extracts aspect-relevant knowledge from the semantic graphs to learn the user’s preferences on different item aspects. The recommendation system, which may be referred to as the Hierarchical Aspect-guided Review generation (HARE) recommendation system, can thus generate high quality and informative explanations to explain the recommendation of items to the user. The explainable recommendation system not only recommends new items but also generates intuitive explanations which can help users to make better and quicker decisions. For example, users may be more persuaded by the explanation to purchase recommended items. The explainable recommendation system was tested using three real-world datasets to evaluate the explanation generation performance and the preference prediction performance. The first and second datasets were derived from Amazon® Review Data containing product reviews from Amazon®. The first subset was derived from the Kindle Store subset (“Kindle”), and the second dataset was derived from the Electronics subset (“Electronics”). The third dataset was derived from the Yelp® Challenge 2019 dataset (“Yelp”). In each dataset, a record includes a user identifier, an item identifier, an overall rating, and a textual review. Aspects were extracted from the reviews and records with only one aspect were excluded. Aspect-relevant sentences were extracted from the reviews as explanations. Figure 4A shows the statistics of the Kindle, Electronics, and Yelp datasets. Each dataset was randomly split the data by the ratio 8:1 :1 as training, validation, and test data for the recommendation system.

The HARE recommendation system was compared against other explanation generation methods - Att2Seq, ExpNet, Ref2Seq, NETE-PMI, and ACF - to evaluate the explanation generation performance. Att2Seq incorporates the Seq2Seq model [Sutskever2014Sequence] and attention mechanism to learn the user’s preference from the user attributes and generate review explanations. ExpNet utilizes an encoder- decoder framework to expand a short phrase to a long review by combining the user and item information with other auxiliary information. Ref2Seq follows the structure of Seq2Seq and learns the representation from the user and item reviews to generate explanations. NETE-PMI adopts MLP to predict the rating and then generates a template-controlled sentence with a single predicted aspect. ACF uses MLP to encode different attributes and applies a coarse-to-fine decoding model to generate long reviews.

The following evaluation metrics were used to evaluate the explanation generation performance of the HARE recommendation system and the other methods - BLEU (bilingual evaluation understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit Ordering). These metrics evaluate the text similarity between the generated and gold explanations. BLEU (specifically BLEU-1 and BLEU-4) evaluates the n-gram overlap between gold and generated explanations. ROUGE (specifically ROUGE-1 and ROUGE-L) evaluates the recall, precision, and accuracy of the n-gram overlap. METEOR calculates the harmonic mean of each word precision and recall based on the whole corpus. Additionally, a Feature Matching Ratio (FMR) is used to measure whether the generated explanation can include the predicted aspects.

Figures 4B and 4C show the performance of the single-aspect and multi-aspect explanation generation tasks achieved by the different methods. Larger BLEU, ROUGE, METEOR, and FMR values indicate better results for the explanation generation performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best explanation generation performance on most metrics compared to the other methods.

The HARE recommendation system was compared against other prediction methods - PMF, SVD++, CARL, RMG, and NETE-PMI - to evaluate the ability in predicting users’ preferences based on the rating scores. PMF is the probabilistic matrix factorization method developed for rating prediction. SVD++ exploits both the user’s preferences on items and the influences between items for recommendation. CARL uses CNNs to learn relevant aspects from the review data. RMG uses a multi-view learning framework to incorporate the review contents and the users’ rating behaviours for the recommendation. NETE-PMI feeds user and item identifiers to an MLP to predict rating scores.

The Mean Absolute Error (MAE) is used as the evaluation metric to evaluate the preference prediction performance of the HARE recommendation system and the other methods. The definition of MAE is as follows.

T denotes the set of test data, r_ui denotes the predicted rating score, r_ui denotes the rating score in the test data, and | | denotes the cardinality of a set. Figure 4D shows the performance of the preference prediction tasks achieved by the different methods. Lower MAE values indicate better results for the preference prediction performance. The best results are in bold and the second-best results are underlined. It was observed that the HARE recommendation system achieved the best preference prediction performance for the Kindle and Yelp datasets, and the second- best prediction performance for the Electronics dataset. These results indicate that the user preference predicted by HARE can support the high-quality explanation generation. In the foregoing detailed description, embodiments of the present disclosure in relation to an explainable recommendation system and method are described with reference to the provided figures. The description of the various embodiments herein is not intended to call out or be limited only to specific or particular representations of the present disclosure, but merely to illustrate non-limiting examples of the present disclosure. The present disclosure serves to address at least one of the mentioned problems and issues associated with the prior art. Although only some embodiments of the present disclosure are disclosed herein, it will be apparent to a person having ordinary skill in the art in view of this disclosure that a variety of changes and/or modifications can be made to the disclosed embodiments without departing from the scope of the present disclosure. Therefore, the scope of the disclosure as well as the scope of the following claims is not limited to embodiments described herein.

Claims

1. A computerized method for recommending an item to a user, the method comprising: generating, from historical reviews by the user, a historical user semantic graph and a set of historical user aspects of the user; generating, from historical reviews of the item, a historical item semantic graph and a set of historical item aspects of the item; constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; generating, from the matched aspects, aspect input data associated with the user’s interest in the item; determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of a first machine learning model; determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of a second machine learning model, the predicted words for explaining the respective predicted aspect; and generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.

2. The method according to claim 1 , wherein the first machine learning model comprises a first LSTM model, and wherein the predicted aspects are determined at each time step of the first LSTM model.

3. The method according to claim 2, wherein determining the predicted aspects comprises determining a current predicted aspect at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step; and determining the current predicted aspect from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the first LSTM model, the initial hidden state calculated from the aspect input data.

4. The method according to claim 3, wherein determining the current predicted aspect comprises calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.

5. The method according to claim 3 or 4, wherein the first machine learning model comprises a MLP model, and wherein the current predicted aspect is determined from the current hidden state using the MLP model.

6. The method according to any one of claims 2 to 5, wherein the second machine learning model comprises a second LSTM model, and wherein the predicted words for each predicted aspect are determined at each time step of the second LSTM model.

7. The method according to claim 6, wherein for each predicted aspect, determining the predicted words comprises determining a current predicted word at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step; and determining the current predicted word from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the second LSTM model, the initial hidden state derived from a current hidden state of the first LSTM model.

8. The method according to claim 7, wherein determining the current predicted word comprises calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.

9. The method according to claim 7 or 8, wherein the second machine learning model comprises a MLP model, and wherein the current predicted word is determined from the current hidden state using the MLP model.

10. The method according to any one of claims 1 to 9, further comprising calculating, from the matched aspects, a rating score representing the user’s interest in the item, wherein the aspect input data comprises the rating score, and optionally wherein the rating score is calculated using a factorization machine model.

11. The method according to any one of claims 1 to 10, wherein: a representation of the user and a representation of the item are embedded using an MLP model; and/or the historical user and item aspects are embedded using a bidirectional LSTM model.

12. A non-transitory computer-readable storage medium storing computer- readable instructions that, when executed by at least one processor of a computer system, cause the computer system to perform the computerized method according to any one of claims 1 to 11.

13. A computer system for recommending an item to a user, the system comprising: a database storing historical reviews by the user and historical reviews of the item; a graph generation module configured for: generating a historical user semantic graph from the historical user reviews; generating a historical item semantic graph from the historical item reviews; an aspect extraction module configured for: extracting a set of historical user aspects of the user from the historical user reviews; and extracting a set of historical item aspects of the item from the historical item reviews; a graph representation learning module configured for: constructing, using graph pooling, a hierarchy of user semantic graphs from the historical user semantic graph and the historical user aspects; and constructing, using graph pooling, a hierarchy of item semantic graphs from the historical item semantic graph and the historical item aspects; an aspect matching module configured for: generating, from the hierarchies of user and item semantic graphs, a set of matched aspects between the historical user and item aspects; an aspect processing module configured for: generating, from the matched aspects, aspect input data associated with the user’s interest in the item; a first machine learning model configured for determining, from the hierarchies of user and item semantic graphs and the aspect input data, a predicted aspect at each layer of the first machine learning model; a second machine learning model configured for determining, from the hierarchies of user and item semantic graphs and each predicted aspect, a predicted word at each layer of the second machine learning model, the predicted words for explaining the respective predicted aspect; and a recommendation module configured for generating, from the predicted aspects and the predicted words, an explainable recommendation for recommending the item to the user.

14. The system according to claim 13, wherein the first machine learning model comprises a first LSTM model configured for determining the predicted aspects at each time step of the first LSTM model.

15. The system according to claim 14, wherein the first LSTM model is configured for determining a current predicted aspect at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted aspect at a previous time step; and determining the current predicted aspect from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the first LSTM model, the initial hidden state calculated from the aspect input data.

16. The system according to claim 15, wherein the first LSTM model is configured for calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.

17. The system according to any one of claims 14 to 16, wherein the second machine learning model comprises a second LSTM model, and wherein the second machine learning model is configured for determining, for each predicted aspect, the predicted words at each time step of the second LSTM model.

18. The system according to claim 17, wherein the second LSTM model is configured for determining, for each predicted aspect, a current predicted word at a current time step, comprising: calculating a current hidden state at the current time step from a previous hidden state and a previous predicted word at a previous time step; and determining the current predicted word from the current hidden state, optionally wherein the previous hidden state is an initial hidden state at an initial time step of the second LSTM model, the initial hidden state derived from a current hidden state of the first LSTM model.

19. The system according to claim 18, wherein the second LSTM model is configured for calculating an attention vector at the previous time step for calculating the current hidden state, and optionally wherein calculating the attention vector comprises: calculating a user attention vector from the hierarchy of user semantic graphs and the previous hidden state; and calculating an item attention vector from the hierarchy of item semantic graphs and the previous hidden state.

20. The system according to any one of claims 13 to 19, wherein the aspect processing module is configured for calculating, from the matched aspects, a rating score representing the user’s interest in the item, and optionally wherein the aspect processing module comprises a factorization machine model for calculating the rating score.