CN111125538B

CN111125538B - Searching method for enhancing personalized retrieval effect by utilizing entity information

Info

Publication number: CN111125538B
Application number: CN201911413378.3A
Authority: CN
Inventors: 窦志成
Original assignee: Renmin University of China
Current assignee: Renmin University of China
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-05-23
Anticipated expiration: 2039-12-31
Also published as: CN111125538A

Abstract

The invention provides a searching method for enhancing personalized retrieval effect by utilizing entity information, which comprises the following steps of 1, personalizing entity links, wherein the personalized entity links utilize historic promotion to query entity link effect and utilize entity enhancement models to carry out user intention modeling; step 2, constructing a user preference portrait, wherein the user preference portrait is constructed based on predicted intention, and a fine user preference portrait enhanced by a memory neural network construction entity by utilizing historical entity information; and step 3, obtaining personalized relevance according to the user intention model and the fine user preference portrait model and sequencing.

Description

Searching method for enhancing personalized retrieval effect by utilizing entity information

Technical Field

The present invention relates to a search method, and more particularly, to a search method for enhancing personalized search results using entity information.

Background

Personalized search has been widely focused, and aims to assist in judging the intention and preference of a user in current query by using the historical behaviors of the user, so that different search result sequences are returned to different users, and user experience is improved. Because of ambiguity and the general shortness and shortness of queries, the queries issued by users often cannot fully express their true intent, but even the same intent, different users may have different preferences, and thus personalization of search results is necessary.

In the prior art, many features such as document topics or sub-topics and the number of clicks of a user are extracted from the user history to calculate relevance for the current candidate document. Deep learning is then also introduced into the personalized search. In addition, the hierarchical recurrent neural network is utilized to dynamically learn the expression of the user portraits from the user history, thereby predicting the relevance of the current document and the user preference portraits. The use of the antagonistic neural network further enhances the effect of the depth model in personalized searches.

Existing personalized search methods are mainly based on historical search records of users to learn the relevance between documents and the user's current queries and user portraits, but may ignore the links between things that exist in the real world but are not reflected in these search records, thereby affecting the learning of relevance matches. Many search models utilize relationships and semantic information existing among entities to improve matching accuracy by introducing a knowledge base. But there is a lack of related methods for introducing knowledge of entities in the field of personalized searches.

In addition to better learning relevance with entity ties, the introduction of entities can better meet some of the demand characteristics of personalized searches. User intent can be better expressed, for example, with explicit entities, especially for ambiguous queries. Meanwhile, historical search information of the user in the personalized search task is also helpful for judging entity links, and further helps to infer and express the intention of the user. Secondly, clicking on the entity contained in the web page by the user may reflect the user's specific preference information more than text information in the entire web page, as the text information of the entire web page is more redundant. The user preference portrait can be better constructed by utilizing the entity information, so that the personalized relevance of the document can be better calculated.

Disclosure of Invention

The invention provides a searching method for enhancing personalized retrieval effect by utilizing entity information, which comprises the steps of firstly carrying out personalized entity linking on inquiry, utilizing history to promote the effect of the entity linking of the inquiry, utilizing the entity to enhance the model to model and express the user intention, then constructing a user portrait more accurately based on predicted intention, constructing a fine user preference portrait enhanced by the entity by utilizing the history entity information through a memory neural network, and finally utilizing the predicted user intention and the user portrait to calculate the personalized relevance of a document and arrange the personalized relevance of the document, thereby promoting user experience. After the sorting is finished, the invention provides that the entity link results of the previous query are adjusted by using the click feedback of the user and the current query, so that the understanding of the historical search intention and preference by a model is further optimized to act on the individuation of the subsequent query results.

Drawings

FIG. 1 is an overall flow chart of the present invention

FIG. 2 is a diagram of a personalized entity linkage structure of the present invention;

FIG. 3 is a diagram of a user portrayal construction.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Knowledge bases have been widely studied in recent years and often introduced into search models as external knowledge to improve the accuracy of matching between queries and documents by their nature of storing a large amount of contact and semantic information between entities in the real world. For example, the document "Chen Kaige" is not highly relevant to the director of the query "bawang-Ji" from the perspective of text semantic similarity alone, but the document is highly consistent with the query intent based on real world context, and the problem of utilizing the relationship between entities can be well solved. But research on the introduction of external knowledge is still relatively lacking in the field of personalized searches.

In addition, the invention utilizes the entity to better predict the search intention of the user in the personalized search and construct the user preference portrait. For example, for a query of "cherry reviews", it is difficult to determine whether the user's search intent is a cherry keyboard or a cherry keyboard due to semantic ambiguity. But the entity is introduced and personalized entity links, and according to the related history query 'famous cherry blossom spots in Asia', the user search intention can be predicted to be cherry blossom. If the user intent is explicitly expressed as a cherry blossom entity, then the web page document describing the cherry blossom may be ranked ahead to meet the user's needs. Further based on predicted user intent, and utilizing entity information in the historical search, finer representations of preferences of the user may be constructed. For example, based on predicting the linked sakura entity, the historical query including the sakura entity can be searched, and further according to the entities such as "japan", "hokkaido" and the like included in the user click document under the query, the finer preference of the user is known as "sakura landscape", and then the sakura tourist attraction in japan can be recommended to the search result front, thereby further improving the user experience.

As shown in FIG. 1, the method comprises the steps of firstly carrying out personalized entity linking on the query, utilizing the history to promote the effect of the entity linking on the query, utilizing the entity to enhance the modeling and representation of the user intention, then constructing the user portrait more accurately based on the predicted intention, constructing the fine user preference portrait enhanced by the entity through the memory neural network by utilizing the history entity information, and finally utilizing the predicted user intention and the user portrait to calculate the personalized relevance of the document and arrange the personalized relevance, thereby promoting the user experience. After the sorting is finished, the invention provides that the entity link results of the previous query are adjusted by using the click feedback of the user and the current query, so that the understanding of the historical search intention and preference by a model is further optimized to act on the individuation of the subsequent query results.

The personalisation entity links are shown in figure 2. The user history consists of a series of search sessions:

a session consists of a series of queries and corresponding sets of document candidates: />

Where h is the id, x that identifies the session _h Is the number of queries within the session. When the user issues the t-th query q in the current session _t After that, there is a need to search interest in the candidate document set according to the user history +.>

Personalized ranking to fit user at q _t The following search intent. The process is repeated until the current session ends.

Dividing the user history into a short-term history and a long-term history is effective in personalized searches. Short term history is defined as historical search records in the current session

The long-term history is defined as search record +.>

If the query q contains x text segments related to the entity, the candidate entity set of the query is defined as:

wherein n is _i The ids of candidate entities related to the ith text fragment in the query are identified. The query entity vector is expressed as:

wherein p is _i，j For entity e _i，j E _i，j Is a pre-trained entity vector, and then trained.

The entity vector of a document is expressed as:

wherein c _i Is the frequency of occurrence of an entity in a document. Likewise, text vector representations of documents and queries are defined as:

w _i is a word vector pre-trained with glove.

As described above, the invention firstly utilizes the user history information to carry out personalized entity linking on the query, namely, calculates the linking probability of each candidate entity, so that the user intention is clearer and clearer, and the invention is used for the subsequent construction of the user portrait and the personalized relevance calculation of the document.

The calculation of the entity link probability is mainly divided into two parts: link relevance between entities and queries, entity link relevance determined based on user history:

wherein MLP stands for full connectivity layer.

Link relevance between an entity and a query includes vector similarity and statistical features:

wherein l _i，j Representing statistical features such as popularity of the candidate entity.

The entity link correlation calculation based on the user history comprises the following steps: modeling a historical search sequence of a user to infer an implicit intention under a current query so as to provide basis for a current entity link; searching related queries in the user history, and providing basis for entity links of the current query by utilizing historical entity information in the queries.

Sequence history modeling first models a sequence of user historical query actions using the LSTM layer and uses the attention mechanism based on the current query to give higher attention to relevant historical actions to infer the current query intent. Firstly, splicing a query text vector in a history search behavior and a text vector of a corresponding click document for short-term history to be used as an input of an LSTM layer, so that short-term user intention t can be obtained _s ：

/>

Wherein the method comprises the steps of

Is the average of the text vectors of the corresponding click documents. The above equation is similarly based on long term historyIs->

And->

Replaced by->

The average value of the text vector of the corresponding click document can be calculated to obtain the long-term user intention t _l ，

Historical entity information modeling utilizes LSTM and intent mechanisms to give higher weight to queries in the historical queries that are related to the current query, and then utilizes entity information in these queries as related historical entity information. Thus, the text vector of the history query is taken as the input of the LSTM layer, and the short-term related entity vector e on the short-term history _s Is calculated as follows:

wherein the method comprises the steps of

The same applies to Q in the above equation based on long term history _s And->

Replaced by Q _l And->

Can obtain long-term related entity vector e _l

The entity link correlation based on the personalized history is as follows:

wherein g (x, y) =tanh (x ^T *MLP(y))。

Based on the predicted user intent, the user's preferences under that intent can be better modeled. And meanwhile, the model can learn finer preferences of the user by further utilizing entity information in the search history. Because of the memory neural network's better ability to store long sequences of information, the present invention uses a kernel-value (key value pair) memory neural network to store user history information to model user portraits, as shown in FIG. 3, which is a user preference portraits construction.

The physical memory neural network is utilized to construct a physical enhanced user representation. The key value is the entity vector of the history query, and the value is the average value of the entity vectors of the documents clicked by the user under the corresponding history query. In this way the fine preferences that the user embodies under historical query intent can be preserved. Thus, there are short-term histories:

wherein the method comprises the steps of

Is the entity vector mean of the corresponding click document.

The entity vector of the current query is then used as the predicted user intent vector to construct the user preference portrait because the predicted entity link probability reflects the user's intent. Based on the entity vector, a short-term entity representation is read from the short-term entity memory neural network through an attention mechanism

The following are provided:

because most of the entities directly related to the current query are retrieved from the memory neural network by using only the entity vector of the current query, the invention then splices the entity vector of the current query and the read user image as a new user intention vector for secondary reading. In this way, entities that are also related to user preferences can be further retrieved from the memory neural network, so that the constructed user representation encompasses a wider range of interests of the user. Thus, there are:

in the same way, based on long-term history, the keys of the neural network are memorized

Sum->

Replaced by->

And the entity vector average value of the corresponding click document, and performing secondary reading to obtain a long-term entity portrait +.>

The text memory neural network then constructs a user interest portrait based on the original text information. Wherein the key value is the text vector of the historical query, and the value is the text vector average value of the corresponding click document. Thus, there are short-term histories:

wherein the method comprises the steps of

Is the text vector mean of the corresponding click document.

Since the original query text may not fully reflect the user's query intent, the present invention queries the original text vector

With implicit user intention vector t modeled with LSTM ^s The user text preference portraits are read using an attention mechanism with stitching as user intent vectors. Since the association between words is not as strong as the association between entities, it is read only once here. Thus constructing a short-term user text representation based on short-term history>

Similarly, the keys of the neural network may be memorized based on long-term history

Sum->

Replaced by->

And the text vector mean of the corresponding click document can construct a long-term user text portrait +.>

Using the predicted user intent and the constructed user portraits, a personalized relevance score can be calculated for the document and personalized ranking can be performed accordingly.

Given user history

The relevance score for candidate document d under query q may be calculated as:

wherein the method comprises the steps of

And->

Representing predicted user intent and user preference vectors, respectively.

The user intent correlation calculates the correlation between the document and the user intent vector:

wherein g (x, y) =tanh (xt×mlp (y))

The user preference correlation calculates the correlation between the document and the user preference portrait:

query relevance is focused on matching between documents and current queries, including vector similarity and traditional click features. Meanwhile, in order to further explore personalized matching between the entities of the query link and the documents, the invention introduces the interactive matching characteristics between the entities, so that the method comprises the following steps:

wherein f _d Representing a traditional click feature such as the number of clicks a user historically has on the url under the same query.

For entity interaction matching feature f _m The invention proposes two interactive matching components with entities, PEDRM and PCERM. To simplify the symbolic representation, all candidate entities in the current query are integrated into one list, so there are:

hereinafter, e ^q And e ^d Will be used to represent the entity encoding vectors in the query and document, respectively.

PEDRM is a matching group price that incorporates personalized information. EDRM first builds a text and entity monitoring matrix between queries and documents, and then extracts matching features using a Gaussian kernel pooling layer:

wherein the method comprises the steps of

Representing the splicing operation, M _e，e M is the interaction matrix between the query entity and the document entity _e，w M is a matrix of interactions between the querying entity and the document text _w，e To query the interaction matrix between text and document entities, M _w，w For query text and document textInteraction matrix between.

In PEDRM, the invention fuses personalized information in the interaction matrix. When calculating the interaction matrix with the entity in the query, the predicted entity probability is taken as the weight of the entity interaction to reflect the relevance to the personalized intent of the user. Meanwhile, the interaction matching matrix R of the relation between the entities and the query vector is added to further extract matching characteristics:

wherein the relationship between the query and the document entity may be characterized as

The interaction matrix of entity relationships is added because the matches between entity vectors do not necessarily reflect the degree of matching between the query and the document entirely. For example, the queries "Obama's life", "Michelle" and "U.S. A" are all related to the entity "Obama", but only the relationship "islife" exists between "Michelle" and "Obama", which meets the requirements of the query. Thus for the interaction feature f _m The calculation of (1) is as follows:

PCERM is a relatively simple interaction matching component that uses only one 3-channel CNN to extract personalized matching interaction features:

f _m ＝MLP(Flat(Relu(C))),

wherein the method comprises the steps of

Representing a stitching operation in a first dimension, W _CNN And b _CNN A, b are parameters of convolution kernel in CNN, a, b are convolution kernel size, flat represents smoothing operation, and matrix is flattened into vector.

After the personalized relevance calculation of the documents is completed and the ranking is performed according to the personalized relevance calculation, the invention provides that the click feedback of the user under the current query and the current query is utilized to adjust the entity link probability of other historical queries in the current session, because the intention of the user in the same session is relatively consistent. For example, with the current query and the user clicking on the entity "software" in the document, the ambiguous historical query "Java" in the session may be considered to refer to the entity as "Java language". The adjusted link result can enable the user portrait constructed when the follow-up inquiry is personalized to be more accurate, and when the user inquires 'which IDE to choose' later, the 'eclipse' webpage suitable for java development can be arranged in front.

But once the entity link probabilities can be adjusted, the situation becomes more complex. Because there is a variation in the link probability of one entity, the link probabilities of other text segment associations can also need to be changed due to the consistent relationship between conversational intents. The idea of the invention is therefore to pick the entity with the highest probability of linking as a reliable link and then use this entity to adjust the probability of candidate entities associated with other text segments.

Specifically, after personalized ranking is completed, entity information in the user click document is utilized to adjust entity probability in the current query:

wherein the method comprises the steps of

To click on the entity vector mean of the document, the superscript t identifies that the query is the t-th query in the current session again. Next find candidate entities under the current queryEpsilon in the body _t Entity with highest probability of linking +.>

If p < = δ (here δ is set to 0.5), the adjustment process ends, followed by processing the user's t+1th query.

If p > delta, assume that the reliably linked entity is associated with the a-th text segment in the query, and take the entity information of the text segment:

to adjust the probability epsilon of other candidate entities in the session _k 1 < = k < = t-1. According to the entity vector similarity and the query text similarity, the link probability of other entities is adjusted as follows:

next find the candidate entity with the highest link probability in the whole session

If p > delta and the text segment associated with the selected entity has not been selected before, repeating the steps to adjust the link probabilities of other candidate entities using the text segment entity information associated with the selected entity; otherwise, the adjustment process is ended, and the t+1st query of the user is processed next. In summary, when the link probability of no entity in the session is greater than the threshold δ or all text segments associated with the entity have been selected, the adjustment process ends. In the present invention, the input of the training model is in units of sessions, so that the parameters w, w used in the process are adjusted while minimizing the loss value of the whole session ₁ ，w ₂ Would be optimized by training.

The invention trains the model by taking the conversation as a unit, adopts the pass function of the pair-wise, so the invention has the following steps:

/>

where s is the query session of user u,

for the user's search history prior to query q, d ⁺ Representing candidate document set under query q>

In (d) positive example document ^- Representing a negative example document.

In order to better model the user intention and the user portrait, the invention provides the effect of enhancing the personalized search by utilizing the entity information in the knowledge base. The invention firstly carries out personalized entity link to eliminate the ambiguity of inquiry, so that the model can better learn and express the intention of the user. According to the predicted user intention, the invention utilizes the entity information in the history search record and constructs the user portrait through the memory network, thereby better modeling the personalized preference of the user. After the personalized score calculation and the sorting of the documents are completed, the invention utilizes the click feedback of the current query and the user to adjust the entity link result of the historical query so as to better analyze the history of the user, which is helpful for further simulating the interests of the user. The invention effectively enhances the personalized searching effect by utilizing the entity information, and can greatly improve the user experience.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The searching method for enhancing the personalized retrieval effect by utilizing the entity information is characterized by comprising the following steps of step 1, personalizing entity links, wherein the personalized entity links utilize historic promotion to query entity link effects and utilize entity enhancement models to model user intention; step 2, constructing a user preference portrait, wherein the user preference portrait is constructed based on predicted intention, and a fine user preference portrait enhanced by a memory neural network construction entity by utilizing historical entity information; step 3, personalized relevance is obtained and ordered according to the user intention model and the fine user preference portrait model;

the personalized entity links are specifically formed by a series of search sessions with user history

The S is ₁ ...S _m The m is the number of sessions, the mth session is the current session, the session S _h Consists of a series of queries and corresponding sets of document candidates: />

Where h is the id, x that identifies the session _h Is the number of queries in the session when the user issues the t-th query q in the current session _t Thereafter, search interest is ++A for candidate document sets based on the user history>

Personalized ranking to fit user at q _t Search intention under d _|Dt| Is the D _t Candidate documents; the user history is then divided into a short-term history, which is a history search record in the current session, and a long-term history>

The s identifies a short term history; the long-term history is in history sessionSearching for records

Where l identifies a long-term history, m-1 represents m-1 sessions preceding the current session as long-term histories,

the query q contains x text fragments related to the entity, and the candidate entity set of the query is:

wherein n is _i Is an id identifying a candidate entity associated with the ith text fragment in the query, then the query entity vector is expressed as:

wherein p is _i，j For entity e _i，j E _i，j For pre-trained entity vectors, then training, the entity vectors of the document are expressed as:

wherein c _i Is the frequency with which entities appear in a document;

the text vector of the document and the text vector of the query are respectively:

wherein w is _i To use the glove pre-trained word vector,

then, sequence history modeling is carried out, firstly, LSTM layer is utilized to model the sequence of user history inquiry behaviors, and the attention mechanism based on the current inquiry is utilized to give higher attention to related history behaviorsForce, for short-term history, the query text vector in the history search behavior and the text vector of the corresponding click document are spliced to be used as the input of the LSTM layer, so that the short-term user intention t can be obtained _s ：

Wherein the method comprises the steps of

For the average value of the text vectors of the corresponding click document, +.>

For output at LSTM layer i instant, alpha _i Attention weight output for each moment, +.>

To normalize the probability function, the MLP represents the fully connected layer, based on long term history, long term user intent t _l By the formula described above>

And->

Replaced by->

Calculating the average value of the text vectors of the corresponding click documents;

then modeling the historical entity information, giving higher weight to the query related to the current query in the historical queries by using LSTM and attribute mechanism, and then usingThe entity information in the queries is used as relevant historical entity information, the text vector of the historical queries is used as the input of an LSTM layer, and the short-term relevant entity vector e in short-term history is used as the input of the LSTM layer _s Is as follows:

wherein the method comprises the steps of

Based on long-term history, Q in the formula _s And->

Replaced by Q _l And->

Obtaining long-term related entity vector e _l ，

The entity link correlation based on the personalized history is as follows:

wherein g (x, y) =tanh (x ^T *MLP(y))。

2. The method of claim 1, wherein the user preference profile is constructed in a manner that utilizes an entity memory neural network to construct an entity-enhanced user profile having, over a short period of history:

wherein the method comprises the steps of

The entity vector average value of the corresponding click document;

then constructing a user preference portrait by taking the entity vector of the current query as the predicted user intention vector, and reading the short-term entity portrait once from the short-term entity memory neural network through an attention mechanism

The method comprises the following steps:

wherein beta is _i Attention weight for memorizing the ith value in neural network, P _e Is a set matrix variable which is used to determine the matrix,

is the entity vector of the current query; splicing the entity vector of the current query and the read user image to be used as a new user intention vector, and carrying out secondary reading:

/>

wherein W is _e For the matrix variables set, β' _i Attention weight for the ith value of the memory neural network; and will memorize the key of the neural network

Sum->

Replaced by->

Then, the text memory neural network is utilized to construct the user interest portraits based on the original text information, wherein the user interest portraits have the following short-term histories:

wherein the method comprises the steps of

The text vector average value of the corresponding click document;

will query the original text vector

With implicit user intention vector t modeled with LSTM ^s Splicing is used as a user intention vector, a user text preference portrait is read by using an attention mechanism, only one reading is performed, and a short-term user text portrait is constructed based on short-term history>

q′＝[t _s ，q]，

And based on long-term history, memorizing the keys of the neural network

Sum->

Replaced by->

And constructing a long-term user text portrait +.>

3. The method of claim 2, wherein personalized relevance is derived and ranked according to the user intent model and the refined user preference portrait model in a particular manner for a given user history

Candidate document d under query q, user intent +.>

Is>

First, a user intent correlation, which is a correlation between a document and a user intent vector, is calculated

Wherein g (x, y) =tanh (x ^T *MLP(y))，

Then calculating a user preference correlation, wherein the user preference correlation is the correlation between the document and the user preference portrait, and the user preference correlation is that

Then calculate the query relevance, which is the match between the document and the current query, which is f (d, g) = [ g (d, q), MLP (f _d )，f _m ]，

Wherein f _d Representing a traditional click feature such as the number of clicks a user historically has on url under the same query, f _m The relevance score of the candidate document d under the query g can be calculated as:

4. the method of claim 1, wherein the personalized relevance calculation of documents is completed and ranked followed by entity link adjustment that adjusts other historical queries in the current session using the current query and click feedback of the user under the current query.