CN111125538A

CN111125538A - Searching method for enhancing personalized retrieval effect by using entity information

Info

Publication number: CN111125538A
Application number: CN201911413378.3A
Authority: CN
Inventors: 窦志成
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-08
Anticipated expiration: 2039-12-31
Also published as: CN111125538B

Abstract

The invention provides a searching method for enhancing personalized retrieval effect by utilizing entity information, which comprises the following steps of 1, personalizing entity links, wherein the personalized entity links are used for carrying out user intention modeling on query entity link effect by utilizing historical improvement and utilizing an entity enhancement model; step 2, constructing a user preference portrait, wherein the user preference portrait is constructed based on predicted intentions, and an entity-enhanced fine user preference portrait is constructed by using historical entity information through a memory neural network; and 3, obtaining personalized relevance according to the user intention model and the fine user preference portrait model and sequencing.

Description

Searching method for enhancing personalized retrieval effect by using entity information

Technical Field

The invention relates to a searching method, in particular to a searching method for enhancing personalized retrieval effect by utilizing entity information.

Background

Personalized search has been widely concerned, and aims to assist in judging the current query intention and preference of a user by using the historical behaviors of the user, so that different search result sequences are returned to different users, and the user experience is improved. Because of ambiguity and the problems that the query is generally short and short, the query issued by the user often cannot fully express the real intention, and different users may have different preferences even with the same intention, the personalization of the search result is necessary.

In the prior art, many documents are related by extracting document topics or sub-topics from user history and calculating the relevance of current candidate documents according to characteristics such as user click times and the like. Deep learning is then also introduced into personalized searches. In addition, the hierarchical recurrent neural network is used to dynamically learn the expression of the user portrait from the user history, thereby predicting the correlation between the current document and the user preference portrait. The effectiveness of the depth model in personalized search is further enhanced by using an antagonistic neural network.

Existing personalized search methods mainly learn the relevance between documents and the current query and the portrait of a user based on historical search records of the user, but may ignore the relation between things existing in the real world but not reflected in the search records, thereby affecting the learning of relevance matching. Many search models improve the accuracy of matching by introducing a knowledge base and utilizing the relationships existing between entities and semantic information. But there is a lack of relevant methods for introducing entity knowledge in the field of personalized search.

In addition to utilizing entity contacts to better learn about relevance, the introduction of entities can better meet some of the desirable characteristics of personalized searches. User intent, especially for ambiguous queries, can be better expressed, for example, with explicit entities. Meanwhile, historical search information of the user in the personalized search task is also helpful for judging entity links, and further helps the conjecture and expression of the user intention. Secondly, compared with the text information in the whole webpage, the entity contained in the clicked webpage of the user can better reflect the specific preference information of the user, because the text information of the whole webpage is more redundant. By utilizing the entity information, a user preference portrait can be better constructed, so that personalized relevance of the document can be better calculated.

Disclosure of Invention

The invention provides a searching method for enhancing personalized retrieval effect by utilizing entity information, which comprises the steps of firstly carrying out personalized entity link on query, utilizing history to improve the link effect of the query entity, simultaneously utilizing the entity to enhance the modeling and representation of user intention, then more accurately constructing a user portrait based on predicted intention, utilizing historical entity information to construct a refined user preference portrait enhanced by the entity through a memory neural network, and finally utilizing the predicted user intention and the user portrait to calculate the personalized relevance of documents and arrange the personalized relevance in sequence, thereby improving the user experience. After the sorting is finished, the invention provides that the entity link results of the previous query are adjusted by utilizing the click feedback of the user and the current query, and the understanding of the historical search intention and preference of the model is further optimized so as to be used for the individuation of the subsequent query results.

Drawings

FIG. 1 is an overall flow chart of the present invention

FIG. 2 is a diagram of a link structure of a personalized entity according to the present invention;

FIG. 3 is a block diagram of a user representation construction.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Knowledge bases have been extensively studied in recent years and, by their nature of storing large amounts of linkage and semantic information between entities in the real world, are often introduced into search models as external knowledge to improve the accuracy of matches between queries and documents. For example, the relevance of the document "chenkege" to the director of the query "(" bawangbeiji ") is not high only from the viewpoint of semantic similarity of texts, but the problem of utilizing the relation between entities can be well solved by making the document highly conform to the query intention based on the real world background. But research on introducing external knowledge in the personalized search field is relatively lacking.

In addition, the invention utilizes the entity to better predict the user search intention in the personalized search and construct the user preference portrait. For example, for a query of "cherry reviews", it is difficult to determine whether the user's search intention is cherry blossom or cherry keyboard due to semantic ambiguity. But entities are introduced, personalized entity links are carried out, and according to the related historical query of 'facial cherry blossoms in Asia', the search intention of the user can be predicted to be cherry blossoms. By explicitly expressing the user's intent as a cherry blossom entity, a web document describing cherry blossom may be ranked ahead to meet the user's needs. Based on the predicted user intention, entity information in historical search is utilized, and a more refined preference portrait of the user can be constructed. For example, based on the prediction of the linked cherry blossom entities, historical queries containing the cherry blossom entities can be retrieved, and further according to the query, the user clicks the entities such as "japan", "hokkaido" and the like contained in the document, so that the more detailed preference of the user is known as the "cherry blossom landscape of japan", the cherry blossom tourist attractions of japan can be recommended to the front of the search results, and the user experience is further improved.

As shown in FIG. 1, the method comprises the steps of firstly carrying out personalized entity link on queries, utilizing history to improve the link effect of the query entities, meanwhile utilizing an entity enhanced model to model and represent user intentions, then more accurately constructing a user portrait based on predicted intentions, utilizing historical entity information to construct an entity enhanced fine user preference portrait through a memory neural network, and finally utilizing the predicted user intentions and the user portrait to calculate personalized relevance of documents and arrange the personalized relevance in order, so that user experience is improved. After the sorting is finished, the invention provides that the entity link results of the previous query are adjusted by utilizing the click feedback of the user and the current query, and the understanding of the historical search intention and preference of the model is further optimized so as to be used for the individuation of the subsequent query results.

The link to the personalized entity is shown in figure 2. The user history consists of a series of search sessions:

a session consists of a series of queries and a corresponding candidate set of documents:

where h is id, x identifying the session_hIs the number of queries within a session. When the user sends out the t query q in the current session_tLater, the candidate document set needs to be searched according to the historical interest of the user

Making personalized sorting to make it conform to user's q_tThe following search intention. This process is repeated until the current session is over.

Dividing the user history into a short-term history and a long-term history is very effective in personalized search. Short-term history is defined as historical search records in the current session

Long-term history is then defined as the search record in a history session

If query q contains x text segments associated with an entity, then the set of candidate entities for the query is defined as:

wherein n is_iThe id of the candidate entity associated with the ith text fragment in the query is identified. The query entity vector is then expressed as:

wherein p is_i，jAs entity e_i，jLink probability of e_i，jIs a pre-trained entity vector and then trained.

The entity vector of the document is represented as:

wherein c is_iIs the frequency of occurrence in the document of the entity. Likewise, the text vector representation of documents and queries is defined as:

w_iis a word vector pre-trained with glove.

As mentioned above, the invention firstly utilizes the user history information to carry out personalized entity linking on the query, namely calculates the link probability of each candidate entity, so that the user intention is clearer and clearer, and the method can be applied to the subsequent construction of the user portrait and the personalized relevance calculation of the document.

The calculation of entity link probability is mainly divided into two parts: link relevance between the entity and the query, entity link relevance determined based on user history:

wherein MLP stands for fully connected layer.

The link relevance between the entity and the query includes vector similarity and statistical features:

wherein l_i，jRepresenting a statistical characteristic such as the popularity of the candidate entity.

The entity link correlation calculation based on the user history comprises the following steps: modeling a historical search sequence of a user to infer an implicit intention under a current query so as to provide a basis for a current entity link; related queries in the user history are searched, and historical entity information in the queries is used for providing a basis for entity links of current queries.

Sequence history modeling first models the sequence of user historical query behaviors using the LSTM layer and assigns higher attention to relevant historical behaviors using an attention mechanism based on the current query to infer the current query intent. Firstly, splicing a query text vector in historical search behaviors and a text vector of a corresponding click document as input of an LSTM layer for short-term history, and obtaining a short-term user intention t_s：

Wherein

Is the average of the text vectors of the corresponding clicked documents. Similarly based on long-term history, from the above equation

And

is replaced by

The average value of the text vector of the corresponding click document can be calculated to obtain the long-term user intention t_l，

The modeling of historical entity information utilizes an LSTM and an attribution mechanism to give higher weight to queries related to a current query in historical queries, and then utilizes entity information in the queries as related historical entity information. The text vector of the historical query is thus used as input to the LSTM layer, the short-term related entity vector e over the short-term history_sIs calculated as follows:

wherein

Similarly, based on long-term history, Q in the above equation_sAnd

is replaced by Q_lAnd

a long-term correlation entity vector e can be derived_l

The entity link relevance based on the personalized history is as follows:

wherein g (x, y) ═ tanh (x)^T*MLP(y))。

Based on the predicted user intent, the user's preferences under that intent can be better modeled. Meanwhile, entity information in the search history is further utilized, and the model can learn more detailed preference of the user. Because the memory neural network has better storage capacity for long-sequence information, the invention adopts a key-value memory neural network to store user historical information to model the user portrait, and the user preference portrait is constructed as shown in fig. 3.

A solid memory neural network is utilized to construct a solid augmented user representation. The key value is an entity vector of historical query, and the value is an average value of entity vectors of documents clicked by users under corresponding historical query. In this way, the refined preferences embodied by the user under historical query intent can be retained. Thus, over a short-term history:

wherein

To be corresponding toClick on the entity vector mean of the document.

The entity vector of the current query is then used as the predicted user intent vector to construct a user preference profile because the predicted entity link probabilities reflect the user's intent. Thus, based on the entity vector, a short-term entity image is read from the short-term entity memory neural network by an attention mechanism

The following were used:

since most entities directly related to the current query are retrieved from the memory neural network only by using the entity vector of the current query, the invention then splices the entity vector of the current query and the read user portrait as a new user intention vector for secondary reading. In this way, entities related to user preferences can be further retrieved from the memory neural network, so that the constructed user representation covers wider interests of the user. Therefore, there are:

in the same way, the keys of the neural network will be memorized based on the long-term history

Sum value

Is replaced by

And the entity vector mean value of the corresponding click document is read for the second time, so that the long-term entity portrait can be obtained

The text memory neural network constructs the user interest portrait based on the original text information. The key value is a text vector of the historical query, and the value is a text vector mean value of the corresponding clicked document. Thus, over a short-term history:

wherein

Is the text vector mean of the corresponding clicked document.

Since the original query text may not completely reflect the query intention of the user, the present invention will query the original text vector

With implicit user intention vector t modeled using LSTM^sStitching is used as a user intention vector, and a user text preference portrait is read by using an attention mechanism. Since the association between words is not as strong as the association between entities, it is only read once here. Thus building short-term user text portrayal based on short-term history

Similarly, the keys of a neural network may be remembered based on long-term history

Sum value

Is replaced by

And the text vector mean value of the corresponding click document can construct a long-term user text portrait

With the predicted user intent and the constructed user representation, personalized relevance scores can be calculated for the documents and personalized ranking can be performed accordingly.

Given a user history

The relevance score for candidate document d under query q may be calculated as:

wherein

And

representing predicted user intent and user preference vectors, respectively.

User intent relevance the relevance between the document and the user intent vector is calculated:

wherein g (x, y) ═ tanh (xT × mlp (y))

User preference relevance the relevance between the document and the user preference representation is calculated:

query relevance concerns the matching between documents and the current query, including vector similarity and traditional click features. Meanwhile, in order to further explore the personalized matching between the entities of the query link and the entities of the document, the invention introduces the interactive matching characteristics between the entities, so that the method comprises the following steps:

wherein f is_dRepresenting traditional click characteristics such as the number of clicks a user has historically made on the url under the same query.

Matching features f for entity interactions_mThe invention proposes two interacting matching components with entities, PEDRM and PCERM. To simplify the notation, all candidate entities in the current query are integrated into one list, so there are:

hereinafter, e^qAnd e^dWill be used to represent entity-encoding vectors in queries and documents, respectively.

PEDRM is a matching group price that incorporates personalized information. EDRM first constructs a text and entity monitoring matrix between queries and documents, and then extracts matching features using a Gaussian kernel pooling layer:

wherein

Representing a splicing operation, M_e，eFor interaction matrices between query entities and document entities, M_e，wFor an interaction matrix between query entities and document text, M_w，eFor the interaction matrix between query text and document entities, M_w，wIs an interaction matrix between query text and document text.

In PEDRM, the invention fuses personalized information in the interaction matrix. When calculating the interaction matrix with the entities in the query, the predicted entity probabilities are used as weights for entity interactions to reflect the relevance to the personalized intent of the user. Meanwhile, an interactive matching matrix R of the relationship between the entities and the query vector is added to further extract matching characteristics:

wherein the relationship between the query and the document entity can be characterized as

The interaction matrix of entity relationships is added because the matching between entity vectors does not necessarily reflect the degree of matching between queries and documents completely. For example, the queries "Obama's life", "Michelle" and "u.s.a" are all related to the entity "Obama", but only the relationship "islife" exists between "Michelle" and "Obama", which meets the query requirement. Thus for the interaction feature f_mThe calculation of (a) is:

the PCERM is a relatively simple interactive matching component, and extracts personalized matching interactive features only by using a 3-channel CNN:

f_m＝MLP(Flat(Relu(C))),

wherein

Representing a splicing operation in a first dimension, W_CNNAnd b_CNNThe parameters of convolution kernel in CNN, a, b are convolution kernel size, Flat represents smoothing operation, and matrix is flattened into vector.

After the document personalized relevance calculation is completed and the ranking is performed according to the document personalized relevance calculation, the entity link probability adjustment is performed on other historical queries in the current session by using the current query and the click feedback of the user under the current query, because the user intentions in the same session are relatively consistent. For example, with the current query and the user clicking on the entity "software" in the document, the ambiguous historical query "Java" in the session may be considered as referring to the entity as "Java language". The adjusted link result can enable the user portrait constructed during subsequent query personalization to be more accurate, and when the user subsequently queries the which IDE to cache, the eclipse webpage suitable for java development can be arranged in front.

But once the entity link probability can be adjusted, the situation becomes more complex. Because the link probability of one entity varies, the link probability of other entities associated with other text segments can also need to be changed due to the consistent relationship between the conversation intents. The idea of the invention is therefore to select the entity with the highest link probability as a reliable link and then use this entity to adjust the candidate entity probabilities associated with other text passages.

Specifically, after the personalized ranking is completed, the entity probability in the current query is first adjusted by the user clicking on the entity information in the document:

wherein

For clicking on documentsThe superscript t identifies that the query is the tth query in the current session. Next, find epsilon in the candidate entity under the current query_tEntity with highest link probability

If p < > is δ (where δ is set to 0.5), the tuning process ends, followed by the t +1 th query from the user.

If p > delta, assuming that the reliably linked entity is associated with the a-th text segment in the query, taking the entity information of the text segment:

to adjust the probability epsilon of other candidate entities in the session_kAnd 1 < k < t-1. According to the similarity of the entity vector and the similarity of the query text, the link probability of other entities is adjusted in the following way:

the candidate entity with the highest probability of being linked in the whole session is found next

If p is larger than delta and the text fragment associated with the selected entity is not selected previously, repeating the steps and adjusting the link probability of other candidate entities by using the text fragment entity information associated with the selected entity; otherwise the adjustment process ends and the t +1 th query of the user is processed next. In summary, when the link probability of no entity in the session is greater than the threshold δ or all the text segments associated with the entities have been selected, the adjustment process ends. In the present invention, the input to the training model is in units of sessions, so the parameters w, w used in the tuning process are minimized when the loss value of the entire session is minimized₁，w₂Will be trained for optimization.

The invention takes conversation as unit training model, and adopts the loss function of pair-wise, so the invention has the following steps:

where s is the query session for user u,

for the user's search history before query q, d⁺Representing a set of candidate documents under a query q

A good case document of (1), d^-Representing negative example documents.

In order to better model the user intention and the user portrait, the invention provides the effect of enhancing the personalized search by utilizing entity information in the knowledge base. The invention firstly carries out personalized entity linkage to eliminate the ambiguity of the query, so that the model can better learn the intention of the user. According to the predicted user intention, the invention utilizes entity information in the historical search record and constructs the user portrait through the memory network, thereby better modeling the user on personalized preference. After the personalized score calculation and ranking of the documents are completed, the entity link results of the historical queries are adjusted by the invention by using the current queries and the click feedback of the user so as to better analyze the history of the user, which is helpful for further simulating the interests of the user. The invention utilizes the entity information, effectively enhances the effect of personalized search and can greatly improve the experience of the user.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A search method for enhancing personalized retrieval effect by using entity information is characterized by comprising the following steps of 1, personalizing entity links, wherein the personalized entity links are used for carrying out user intention modeling on query entity link effect by using history improvement and by using an entity enhancement model; step 2, constructing a user preference portrait, wherein the user preference portrait is constructed based on predicted intentions, and an entity-enhanced fine user preference portrait is constructed by using historical entity information through a memory neural network; and 3, obtaining personalized relevance according to the user intention model and the fine user preference portrait model and sequencing.

2. The method of claim 1, wherein personalizing the entity links to queries is performed in such a way that a user history is composed of a series of search sessions

Said S₁...S_mIs a session, m is the number of sessions, the mth session is the current session, the session S_hConsists of a series of queries and corresponding candidate sets of documents:

where h is id, x identifying the session_hIs the number of queries in a session, when the user issues the tth query q in the current session_tThereafter, the set of candidate documents is searched for interest based on the user history

Making personalized sorting to make it conform to user's q_tA search intention of_|Dt|Is No. D_tA candidate document; the user history is then divided into a short-term history and a long-term history, the short-term history being a historical search record in the current session

The s identifies a short-term history; the long-term history is the search record in the history session

Wherein l identifies the long-term history, m-1 indicates that m-1 sessions prior to the current session are the long-term history,

if query q contains x text segments related to entities, the candidate entity set of the query is:

wherein n is_iIs the id that identifies the candidate entity associated with the ith text fragment in the query, the query entity vector is represented as:

wherein p is_i，jAs entity e_i，jLink probability of e_i，jFor the pre-trained entity vector, which is then trained, the entity vector for the document is represented as:

wherein c is_iA frequency of occurrence in the document for the entity;

the text vector of the document and the text vector of the query are respectively:

wherein w_iTo pre-train the word vector with glove,

then, sequence history modeling is carried out, namely, firstly, an LSTM layer is utilized to model the user history query behavior sequence, and an attention mechanism based on the current query is utilized to model the user history query behavior sequenceThe related historical behaviors are given higher attention, and for the short-term history, the query text vector in the historical search behaviors and the text vector of the corresponding click document are spliced to be used as the input of an LSTM layer, so that the short-term user intention t can be obtained_s：

Wherein

For the average of the text vectors of the corresponding clicked documents,

for the output at time instant LSTM layer i, α_iThe attention weight output for each time instant,

for the normalized probability function, MLP represents the fully connected layer, based on long-term history, long-term user intent t_lBy applying the above formula

And

is replaced by

Calculating the mean value of the text vector of the corresponding clicked document;

then modeling historical entity information by using LSTM and atteThe ntion mechanism gives higher weight to the query related to the current query in the historical query, then uses the entity information in the query as the related historical entity information, uses the text vector of the historical query as the input of the LSTM layer, and uses the short-term related entity vector e in the short-term history_sThe method comprises the following steps:

wherein

Based on long-term history, Q in the formula is_sAnd

is replaced by Q_lAnd

obtaining a long-term correlation entity vector e_l，

The entity link relevance based on the personalized history is as follows:

wherein g (x, y) ═ tanh (x)^T*MLP(y))。

3. The method of claim 2, wherein the user preference profile is constructed by constructing an entity enhanced user profile using an entity memory neural network, having, over a short term history:

wherein

And the entity vector mean value of the corresponding click document.

Then, the entity vector of the current query is used as a predicted user intention vector to construct a user preference portrait, and a short-term entity portrait is read from a short-term entity memory neural network once through an attention mechanism

Comprises the following steps:

β therein_iAttention weight, P, for the ith value in a memory neural network_eIs a variable of the matrix that is set up,

is the entity vector of the current query. Splicing the entity vector of the current query and the read user portrait as a new user intention vector, and performing secondary reading:

wherein W_eIs a matrix variable of setting, β'_iFor the attention weight of the ith value of the memory neural network. And will memorize the keys of the neural network

Sum value

Is replaced by

And the entity vector mean value of the corresponding click document is read for the second time to obtain a long-term entity portrait

Then, constructing a user interest portrait based on original text information by utilizing a text memory neural network has the following short-term history:

wherein

Is the text vector mean of the corresponding clicked document.

Will query the original text vector

With implicit user intention vector t modeled using LSTM^sSplicing as a user intention vector, reading the user text preference portrait by using an attention mechanism, only reading once, and constructing a short-term user text portrait based on short-term history

And based on long-term history, will remember the keys of the neural network

Sum value

Is replaced by

Constructing long-term user text portrait with text vector mean of corresponding click document

4. The method of claim 3, wherein the personalized relevance and ranking based on the user intent model and the fine user preference profile model is by way of a history for a given user

Candidate document d, user intent under query q

And user preference vector

First, a user intent relevance is calculated, the user intent relevance being the relevance between the document and the user intent vector

Wherein g (x, y) ═ tanh (x)^T*MLP(y))，

A user preference relevance is then calculated, the user preference relevance being a relevance between the document and the user preference profile, the user preference relevance being

Query relevance, which is the match between the document and the current query, is then calculated as f (d, q) — [ g (d, q), MLP (f)_d)，f_m]，

Wherein f is_dRepresenting traditional click characteristics such as the number of clicks a user has historically made on the url under the same query, said f_mThrough two interaction matching components with the entity, finally the relevance score of the candidate document d under the query q can be calculated as:

5. the method of claim 1, wherein the document personalized relevance computation is completed and ranked before an entity link adjustment is made that adjusts other historical queries in the current session using the current query and click feedback of the user under the current query.