CN112069399A - Personalized search system based on interactive matching - Google Patents

Personalized search system based on interactive matching Download PDF

Info

Publication number
CN112069399A
CN112069399A CN202010861245.9A CN202010861245A CN112069399A CN 112069399 A CN112069399 A CN 112069399A CN 202010861245 A CN202010861245 A CN 202010861245A CN 112069399 A CN112069399 A CN 112069399A
Authority
CN
China
Prior art keywords
matching
vector
document
user
personalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010861245.9A
Other languages
Chinese (zh)
Other versions
CN112069399B (en
Inventor
窦志成
邴庆禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN202010861245.9A priority Critical patent/CN112069399B/en
Publication of CN112069399A publication Critical patent/CN112069399A/en
Application granted granted Critical
Publication of CN112069399B publication Critical patent/CN112069399B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention realizes an interactive matching-based personalized search system through a method in the field of artificial intelligence, and comprises a system input module, an interactive matching-based personalized search module and an output module, wherein the operation process of the interactive matching-based personalized search module comprises four steps of bottom layer matching modeling of user search history, calculation of attention weight, generation of user interest matching vectors and personalized reordering, the model idea of matching the history query of a user and candidate documents is interacted at a word level, the idea of reducing the influence of irrelevant information in the search history is focused, a convolutional neural network is used for fusing weighted matching methods, so that the final interest matching vectors of the documents are generated, more accurate interest matching scores are obtained, and the problem that the quality of sequencing results depends on the vector construction model under the existing vector representation-based method is solved, and the process of constructing the vector may omit some useful information.

Description

Personalized search system based on interactive matching
Technical Field
The invention relates to the field of artificial intelligence, in particular to an interactive matching-based personalized search system.
Background
Personalizing user searches with the user's historical information has proven to be effective in improving the quality of search rankings. The personalized search algorithm firstly models the interests of the user according to the information such as the historical behaviors of the user, not only considers the relevance of the query statement and the document, but also introduces the matching degree between the document and the interests of the user when calculating the matching score, thereby customizing a search result list which meets the requirements of different users in a personalized manner. The user interest model can be established based on various information sources, such as the position information of the user, the retrieval mode, the browsing history and the search history of the user, and the like. In recent years, researchers introduce a deep learning method into a personalized ranking model, so that the semantic understanding capability of the model to texts is enhanced, and a good effect on personalized rearrangement of search results is achieved. Ranking algorithms using deep learning can be classified into representation-based matching and interaction-based matching. The expression matching is based on that in a sorting algorithm, semantic vector expressions of a query and a document are obtained by learning respectively, and then the two vectors are subjected to matching calculation, and the core of the algorithm is to learn the semantic vector expressions. The algorithm based on interactive matching is to interact the query and the document in advance at a finer-grained word level, grab more complete matching signals, and fuse the matching signals into a matching score, and the core of the algorithm is how to process the matching signals and fuse the matching signals into a matching score. The existing personalized search algorithm almost calculates the interest expression vector of a user firstly, then interacts with the expression vector of a candidate document to obtain a personalized matching score, and an algorithm idea based on expression matching is used.
Most of the existing personalized ranking algorithms directly calculate interest expression vectors of users in various ways according to historical behaviors of the users, and then interact with the expression vectors of candidate documents to obtain personalized matching scores. The method of the type is to obtain a matching signal of the document and the user interest by taking the whole document as a unit, and is mainly characterized in that the document to be matched and the user interest are converted into a representation vector, then vector matching is carried out, and the construction of a representation layer is emphasized. Under the vector representation-based method, the quality of the ranking result depends on the quality of a vector construction model, and the vector construction process may omit some useful information, such as text information and interaction information of queries and documents at a word level, thereby affecting the personalized ranking result.
Disclosure of Invention
Therefore, the invention provides an interactive matching-based personalized search system, which comprises an input module, an interactive matching-based personalized search module and an output module;
the input module is used for reading the user query history and the alternative documents, standardizing the formats of the documents and inputting the documents into the personalized search module based on the interactive matching,
the operation process of the personalized search module based on interactive matching is divided into four steps:
the method comprises the following steps: the method comprises the following steps of performing bottom matching modeling on user search history, namely establishing a bottom matching model by utilizing historical search information of a user, and interacting historical query of the user and candidate documents according to words to obtain a detailed bottom matching signal;
step two: the calculation step of the attention weight value, introduce the attention mechanism, according to the degree of contribution of different inquiry records in the user search history to the current inquiry, carry on the weighting processing to its corresponding matching signal;
step three: a step of generating a user interest matching vector, which is to extract the characteristics of the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating the relevance scores of the candidate documents through clicking the feature vectors, and performing personalized reordering by taking the sum of the two as a final document matching score;
and the output module outputs the document matching distribution and the personalized rearrangement result.
The specific implementation manner of the bottom layer matching modeling step of the user search history is as follows: defining a history of a userThe query list is { q1,q2,q3,…,qnH (where n is 3 or more and is an integer), the current candidate document is d, and for each historical query-candidate document pair<qi,d>Firstly, mapping the two words into word vectors word by word, using word2vec model to express the word vectors, qiAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwy}. Each vector in the two groups of word vectors is interacted pairwise to obtain<qi,d>The word matching matrix T of (a), each element in the matching matrix T being:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjAnd representing word vectors corresponding to jth words in the candidate documents (wherein i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y, and i, j, x and y are integers), and calculating the matching values of the i, the j, the x and the y by a cosine function. In the K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vector
Figure BDA0002648218730000021
The corresponding formula of the RBF kernel is as follows:
Figure BDA0002648218730000031
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAll the parameters are hyper-parameters, mu is uniformly valued from-1 to 1, and then logarithms of characteristic vectors corresponding to each row in the matching matrix are taken and summed to serve as historical queries qiFinal underlying matching results with candidate documents:
Figure BDA0002648218730000032
for the underlying match vector calculated based on the user's historical search information, { v }1,v2,v3,…,vnAnd expressing the element of the fine-grained matching vector v of the candidate document.
The specific implementation manner of the calculation step of the attention weight value is as follows: and calculating an attention weight value for the bottom layer matching vector corresponding to each historical query record by using the fine-grained matching vector v of the current query q and the candidate document d:
ei=g(v,vi)
Figure BDA0002648218730000033
wherein g is a multilayer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value, the weighted bottom layer matching vector is:
Figure BDA0002648218730000034
the weighted fine-grained matching vector corresponding to each historical query of the user is { V }1,V2,V3,…,Vn}。
The specific implementation manner of the step of generating the user interest matching vector is as follows: matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×nAnd using 100 convolution cores to carry out convolution on the matched feature matrix M to obtain a three-dimensional tensor A belonging to the R100×(K-2)×(n-2)Each element in tensor a is:
Figure BDA0002648218730000035
wherein t is an integer of 1-100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,
Figure BDA0002648218730000036
the method comprises the steps of multiplying elements at corresponding positions of two matrixes, adding and summing all the products, adopting a Relu function as an activation function by a convolutional layer, processing the convolutional layer, and applying maximal pooling to the second dimension and the third dimension of a three-dimensional tensor A by a pooling layer to obtain a 100-dimensional vector I, ItFor the t-th element in vector I:
Figure BDA0002648218730000041
the output vector I is the final user interest matching vector.
The size of the convolution kernel is 3 x 3, and there are at least 3 pieces in the search history of each user.
The specific implementation manner of the personalized reordering step is as follows: the matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer sensing computer according to click times, original click positions and click entropies; and adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
In the calculation of the relevance scores of the candidate documents and the current query, the training is carried out through a Lambdarank algorithm, the clicked document is taken as a relevant document sample, the rest documents are taken as irrelevant samples,selecting a related document diAnd an irrelevant document djDocument pairs are constructed to calculate losses. The calculation of the loss function also introduces the influence degree of the sequence of the exchanged document pairs on the evaluation index MAP, and the sequence is used as a corresponding weight value, namely, the document pairs with larger difference (the MAP change value after the exchange sequence is large) are endowed with larger weight values. The loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability with the change value of the MAP evaluation index:
Figure BDA0002648218730000042
wherein Δ is a document diAnd document djThe change value of the MAP evaluation index after the exchange position,
Figure BDA0002648218730000043
representing a document diDocument djActual probability of high correlation, pijRepresenting the prediction probability, prediction probability pijThe calculation method comprises the following steps:
Figure BDA0002648218730000044
the technical effects to be realized by the invention are as follows:
(1) a model idea based on interactive matching is introduced, a text is not converted into a unique integral expression vector, and the historical query of a user is interacted with a candidate document at a word level to obtain a more accurate and complete matching signal.
(2) An attention mechanism is introduced, and corresponding matching signals are weighted according to the contribution degree of different historical queries to current matching, so that the influence of irrelevant information in search history is reduced.
(3) Feature extraction is carried out on the weighted matching signals by using a convolutional neural network, and a final interest matching vector of the document is generated, so that more accurate interest matching scores are obtained.
Drawings
FIG. 1 is a framework of a personalized search module based on interactive matching;
Detailed Description
The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
In order to achieve the above object, the present invention provides a personalized search system based on interactive matching.
The system comprises an input module, an individualized searching module based on interactive matching and an output module; the input module is used for reading user query history and alternative documents, standardizing formats of the documents and inputting the documents into the personalized search module based on interactive matching, and the output module outputs the document matching scores and personalized rearrangement results.
And the personalized search module based on interactive matching processes the bottom layer matching signals by using a convolutional neural network to obtain a final interest matching result of the candidate document.
The personalized search module based on interactive matching considers the historical query in the user historical behavior information and the matching signal between words of the candidate document, and a historical query list { q ] of the user1,q2,q3,…,qnD is the current candidate document, firstly, a search log of a user is processed through a K-NRM model based on interactive matching, and each historical query q is obtainediAnd fine-grained matching vector v of candidate document di(where 1 ≦ i ≦ n), and the fine-grained match vector v for the current query q and candidate document d. Then, considering that the user interests are dynamically changing and the user query sometimes has a certain contingency, the contribution of different queries in the user search history to the current query is different. According to the contribution degree of each historical query to the current query, a matching vector { v ] generated by a multi-layer perceptron on the K-NRM model1,v2,v3,…,vnWeighting to obtain a weighted matching vector list (V)1,V2,V3,…,Vn}. Then, convolutional nerves are utilizedThe network processes these vectors to derive matching vectors between the candidate documents and the user interests. And finally, respectively calculating the interest matching score and the relevancy score of the current candidate document according to the interest matching vector and the click feature vector, and adding to obtain a final document matching score, wherein the formula is as follows:
score(d)=score(d|I)+score(d|q)
where score (d | I) represents the matching score of the current candidate document to the user search interests and score (d | q) represents the relevance score of the current candidate document to the current query.
The framework of the personalized search module based on interactive matching is shown in fig. 1, and is divided into the following four parts according to the processing flow:
the method comprises the following steps: the bottom level of the user search history is modeled as a match. And establishing a bottom layer matching model by using the historical search information of the user, and interacting the historical query of the user and the candidate document according to words to obtain a detailed matching signal of the bottom layer.
Step two: and (4) calculating the attention weight value. And (4) introducing an attention mechanism, and weighting the corresponding matching signals according to the contribution degree of different query records in the user search history to the current query.
Step three: and generating a user interest matching vector. And performing feature extraction on the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest.
Step four: and (5) personalized reordering. And calculating the personalized score of the candidate document according to the obtained interest matching vector, calculating the relevance score of the candidate document by clicking the feature vector, and performing personalized rearrangement by taking the sum of the two as the final document matching score.
Bottom matching modeling step of user search history:
the search history of the user can provide rich information for obtaining the search interest of the user. Most of the existing algorithms model the interest of the user based on the historical behavior information of the user to obtain an interest vector representing the search preference of the user, and then carry out interactive processing with the document vector. A K-NRM framework is adopted, for each user U, a bottom layer matching model is established by using historical search information of the user U, and each historical query in the user historical search is interactively matched with a candidate document at the bottom layer.
The user's historical query list is q1,q2,q3,…,qnAnd d is the current candidate document. For each historical query-candidate document pair<qi,d>Firstly, the two words are mapped into word vectors word by word, and the word vectors are expressed by using a word2vec model. q. q.siAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwy}. Each vector in the two groups of word vectors is interacted pairwise to obtain<qi,d>Matches the matrix T. Each element in the matching matrix T is given by the following formula:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjAnd representing a word vector corresponding to the jth word in the candidate document (wherein i is more than or equal to 1 and less than or equal to x, and j is more than or equal to 1 and less than or equal to y), and calculating the matching value of the two words by a cosine function.
As can be seen from the above description, the ith row in the matching matrix represents the signal of matching the ith word in the historical query with the candidate document. In the K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vector
Figure BDA0002648218730000071
The corresponding formula of the RBF kernel is as follows:
Figure BDA0002648218730000072
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAre all hyper-parameters. In the K-NRM model used by the user, the cosine similarity value of the vector is between-1 and 1, so mu is uniformly taken from-1 to 1. Then, logarithm of characteristic vector corresponding to each row in the matching matrix is taken and then summed to serve as historical query qiThe final underlying matching results with the candidate documents are as follows:
Figure BDA0002648218730000073
for each historical query qiIt has a K-dimensional matching vector with the current candidate document, and the matching vector is the historical query qiAnd fine-grained matching vector v of candidate document di. And calculating a fine-grained matching vector v of the current query q and the candidate document d by the process. To this end, we have obtained the underlying match vector calculated based on the user's historical search information, using { v }1,v2,v3,…,vnRepresents it.
The calculation step of the attention weight value:
because the search interest and the search mode of the user are dynamically changed and the user query has a certain contingency, the influence degree of different query records in the search history of the user on the current query is different. Based on the consideration, the step introduces an attention mechanism, and further optimizes each bottom layer matching vector according to the contribution degree of different historical queries to the current matching.
In the previous step, we obtain the underlying match vector { v } calculated using the user's historical search information1,v2,v3,…,vn}. In the step, based on the fine-grained matching vectors v of the current query q and the candidate document d, the attention weight value is calculated for the bottom layer matching vector corresponding to each historical query record. The input of the attention layer is the bottom layer matching vector v calculated in the previous step1,v2,v3,…,vnAnd v, the calculation formula is as follows:
ei=g(v,vi)
Figure BDA0002648218730000074
wherein g (-) is a multi-layer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value. The weighted bottom-level matching vector is given by the following formula:
Figure BDA0002648218730000081
and the attention layer gives more attention to the bottom layer matching vector corresponding to the history query with larger contribution according to the information quantity of the current matching contribution of different history queries in the user search history, and obtains optimized bottom layer matching information weighted according to the contribution degree. At this point, we obtain a weighted fine-grained matching vector { V) corresponding to each historical query of the user1,V2,V3,…,Vn}。
Generating a user interest matching vector:
matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×n. The traditional approach is to apply maximum pooling or average pooling directly on the matching feature matrix to obtain the user interest matching vector. However, given the potentially large number of historical search records in a user's search history, applying pooling directly on the matching feature matrix may omit some useful information, such as relationship information between the underlying matching vectors corresponding to adjacent historical queries.
To compensate for this deficiency, this step uses 100 convolution kernels f of 3 × 31,f2,…,f100Convolving the matched feature matrix M to obtain a three-dimensional tensor A epsilon R100×(K-2)×(n-2). Each element in tensor a is disclosed byThe formula is given as:
Figure BDA0002648218730000082
wherein t is more than or equal to 1 and less than or equal to 100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,
Figure BDA0002648218730000083
which represents an operation of multiplying elements at corresponding positions of two matrices and summing up all the products. The convolution layer of this step uses a convolution kernel of 3 × 3, which requires at least 3 historical query records in the search history of each user. In other words, the model does not support users with less than three historical query records, because too few historical query records cannot provide enough information for extracting the search interest of the user, and in this case, personalized rearrangement of the documents can interfere with accurate calculation of the document scores. In addition, the convolution layer uses the Relu function as an activation function, and the Relu function has smaller calculation amount compared with other activation functions such as sigmoid and the like, and can avoid the problem of gradient disappearance.
After convolutional layer processing, we apply max-Pooling to the second and third dimensions of the three-dimensional tensor A at the pooling layer to obtain a 100-dimensional vector I. I istFor the t-th element in the vector I, the calculation formula is as follows:
Figure BDA0002648218730000091
the purpose of the pooling layer is to perform further feature extraction on the matched feature tensor A, and the output vector I is the final user interest matching vector.
Personalized reordering step
Since the score of a candidate document consists of two parts: a matching score of the candidate documents to the user interests and a relevance score to the current query. The matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; and the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer perception computer according to three click features, namely the click times, the original click position and the click entropy. And adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
The Lambdannk algorithm is selected for training, the clicked document is used as a related document sample, other documents are used as unrelated samples, and a related document d is selectediAnd an irrelevant document djDocument pairs are constructed to calculate losses. The loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability by the change value of the MAP evaluation index, and the calculation formula is as follows:
Figure BDA0002648218730000092
wherein, Delta is the variation value of the MAP evaluation index,
Figure BDA0002648218730000093
representing a document diDocument djActual probability of high correlation, pijRepresenting its prediction probability;
Figure BDA0002648218730000094
representing a document djDocument diActual probability of high correlation, pjiRepresenting its prediction probability. Prediction probability pijCalculated by the following formula:
Figure BDA0002648218730000095
and finally outputting the obtained personalized sorting result to an output module for external output.

Claims (7)

1. An interactive matching-based personalized search system, which is characterized in that: the system comprises an input module, an individualized searching module based on interactive matching and an output module;
the input module is used for reading the user query history and the alternative documents, standardizing the formats of the documents and inputting the documents into the personalized search module based on the interactive matching,
the operation process of the personalized search module based on interactive matching is divided into four steps:
the method comprises the following steps: the method comprises the following steps of performing bottom matching modeling on user search history, namely establishing a bottom matching model by utilizing historical search information of a user, and interacting historical query of the user and candidate documents according to words to obtain a detailed bottom matching signal;
step two: the calculation step of the attention weight value, introduce the attention mechanism, according to the degree of contribution of different inquiry records in the user search history to the current inquiry, carry on the weighting processing to its corresponding matching signal;
step three: a step of generating a user interest matching vector, which is to extract the characteristics of the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating the relevance scores of the candidate documents through clicking the feature vectors, and performing personalized reordering by taking the sum of the two as a final document matching score;
and the output module outputs the document matching distribution and the personalized rearrangement result.
2. The personalized search system based on interactive matching as claimed in claim 1, wherein: the specific implementation manner of the bottom layer matching modeling step of the user search history is as follows: defining a user's historical query list as q1,q2,q3,…,qnWherein n is an integer of n.gtoreq.3The current candidate document is d, and for each historical query-candidate document pair<qi,d>Firstly, mapping the two words into word vectors word by word, using word2vec model to express the word vectors, qiAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwyAnd (4) interacting every two of the two groups of word vectors to obtain<qi,d>The word matching matrix T of (a), each element in the matching matrix T being:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjRepresenting word vectors corresponding to jth words in the candidate documents, wherein i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y, i, j, x and y are integers, the matching values of the i, the j, the x and the y are calculated by a cosine function, and in a K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vector
Figure FDA0002648218720000011
The corresponding formula of the RBF kernel is as follows:
Figure FDA0002648218720000021
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAll the parameters are hyper-parameters, mu is uniformly valued from-1 to 1, and then logarithms of characteristic vectors corresponding to each row in the matching matrix are taken and summed to serve as historical queries qiFinal underlying matching results with candidate documents:
Figure FDA0002648218720000022
for the underlying match vector calculated based on the user's historical search information, { v }1,v2,v3,…,vnAnd expressing the element of the fine-grained matching vector v of the candidate document.
3. The personalized search system based on interactive matching as claimed in claim 2, wherein: the specific implementation manner of the calculation step of the attention weight value is as follows: and calculating an attention weight value for the bottom layer matching vector corresponding to each historical query record by using the fine-grained matching vector v of the current query q and the candidate document d:
ei=g(v,vi)
Figure FDA0002648218720000023
wherein g is a multilayer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value, the weighted bottom layer matching vector is:
Figure FDA0002648218720000024
the weighted fine-grained matching vector corresponding to each historical query of the user is { V }1,V2,V3,…,Vn}。
4. The personalized search system based on interactive matching as claimed in claim 3, wherein: the specific implementation manner of the step of generating the user interest matching vector is as follows: matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×nTo makeAnd (3) performing convolution on the matched feature matrix M by using 100 convolution cores to obtain a three-dimensional tensor A belonging to R100×(K-2)×(n-2)Each element in tensor a is:
Figure FDA0002648218720000025
wherein t is an integer of 1-100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,
Figure FDA0002648218720000026
the method comprises the steps of multiplying elements at corresponding positions of two matrixes, adding and summing all the products, adopting a Relu function as an activation function by a convolutional layer, processing the convolutional layer, and applying maximal pooling to the second dimension and the third dimension of a three-dimensional tensor A by a pooling layer to obtain a 100-dimensional vector I, ItFor the t-th element in vector I:
Figure FDA0002648218720000031
the output vector I is the final user interest matching vector.
5. The personalized search system based on interactive matching of claim 4, wherein: the size of the convolution kernel is 3 x 3, and there are at least 3 pieces in the search history of each user.
6. The personalized search system based on interactive matching of claim 5, wherein: the specific implementation manner of the personalized reordering step is as follows: the matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer sensing computer according to click times, original click positions and click entropies; and adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
7. The personalized search system based on interactive matching of claim 6, wherein: in the calculation of the relevance scores of the candidate documents and the current query, training is carried out through a Lambdarank algorithm, a clicked document is used as a relevant document sample, other documents are used as irrelevant samples, and a relevant document d is selectediAnd an irrelevant document djThe loss function is calculated by forming a document pair, the influence degree of the sequence of the exchange document pair on the evaluation index MAP is also introduced into the calculation of the loss function, the influence degree is used as a corresponding weight value, namely the greater the change value of the MAP after the exchange sequence is, the greater the document difference is, the greater the weight value is given to the exchange sequence, and the loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability and the change value of the MAP evaluation index:
Figure FDA0002648218720000032
Figure FDA0002648218720000033
wherein Δ is a document diAnd document djThe change value of the MAP evaluation index after the exchange position,
Figure FDA0002648218720000034
representing a document diDocument djActual probability of high correlation, pijRepresenting the prediction probability.
CN202010861245.9A 2020-08-25 2020-08-25 Personalized search system based on interaction matching Active CN112069399B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010861245.9A CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010861245.9A CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Publications (2)

Publication Number Publication Date
CN112069399A true CN112069399A (en) 2020-12-11
CN112069399B CN112069399B (en) 2023-06-02

Family

ID=73658899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010861245.9A Active CN112069399B (en) 2020-08-25 2020-08-25 Personalized search system based on interaction matching

Country Status (1)

Country Link
CN (1) CN112069399B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987155A (en) * 2021-11-25 2022-01-28 中国人民大学 Session type retrieval method integrating knowledge graph and large-scale user logs
CN114357231A (en) * 2022-03-09 2022-04-15 城云科技(中国)有限公司 Text-based image retrieval method and device and readable storage medium
CN117851444A (en) * 2024-03-07 2024-04-09 北京谷器数据科技有限公司 Advanced searching method based on semantic understanding
CN117851444B (en) * 2024-03-07 2024-06-04 北京谷器数据科技有限公司 Advanced searching method based on semantic understanding

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
CN107957993A (en) * 2017-12-13 2018-04-24 北京邮电大学 The computational methods and device of english sentence similarity
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN111125538A (en) * 2019-12-31 2020-05-08 中国人民大学 Searching method for enhancing personalized retrieval effect by using entity information
CN111177357A (en) * 2019-12-31 2020-05-19 中国人民大学 Memory neural network-based conversational information retrieval method
CN111310023A (en) * 2020-01-15 2020-06-19 中国人民大学 Personalized search method and system based on memory network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
CN107291871A (en) * 2017-06-15 2017-10-24 北京百度网讯科技有限公司 Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence
US20190114511A1 (en) * 2017-10-16 2019-04-18 Illumina, Inc. Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks
CN107957993A (en) * 2017-12-13 2018-04-24 北京邮电大学 The computational methods and device of english sentence similarity
CN111125538A (en) * 2019-12-31 2020-05-08 中国人民大学 Searching method for enhancing personalized retrieval effect by using entity information
CN111177357A (en) * 2019-12-31 2020-05-19 中国人民大学 Memory neural network-based conversational information retrieval method
CN111310023A (en) * 2020-01-15 2020-06-19 中国人民大学 Personalized search method and system based on memory network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENYAN XIONG ET AL.: "End-to-End Neural Ad-hoc Ranking with Kernel Pooling", 《RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》 *
周雨佳 等: "基于递归神经网络与注意力机制的动态个性化搜索算法", 《计算机学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987155A (en) * 2021-11-25 2022-01-28 中国人民大学 Session type retrieval method integrating knowledge graph and large-scale user logs
CN113987155B (en) * 2021-11-25 2024-03-26 中国人民大学 Conversational retrieval method integrating knowledge graph and large-scale user log
CN114357231A (en) * 2022-03-09 2022-04-15 城云科技(中国)有限公司 Text-based image retrieval method and device and readable storage medium
CN114357231B (en) * 2022-03-09 2022-06-28 城云科技(中国)有限公司 Text-based image retrieval method and device and readable storage medium
CN117851444A (en) * 2024-03-07 2024-04-09 北京谷器数据科技有限公司 Advanced searching method based on semantic understanding
CN117851444B (en) * 2024-03-07 2024-06-04 北京谷器数据科技有限公司 Advanced searching method based on semantic understanding

Also Published As

Publication number Publication date
CN112069399B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
CN110188358B (en) Training method and device for natural language processing model
CN110717098B (en) Meta-path-based context-aware user modeling method and sequence recommendation method
CN110929164A (en) Interest point recommendation method based on user dynamic preference and attention mechanism
CN111737578B (en) Recommendation method and system
CN110232122A (en) A kind of Chinese Question Classification method based on text error correction and neural network
CN112328900A (en) Deep learning recommendation method integrating scoring matrix and comment text
CN112800344B (en) Deep neural network-based movie recommendation method
CN112884551B (en) Commodity recommendation method based on neighbor users and comment information
CN112527993B (en) Cross-media hierarchical deep video question-answer reasoning framework
Zhou et al. Interpretable duplicate question detection models based on attention mechanism
CN110222838B (en) Document sorting method and device, electronic equipment and storage medium
CN112069399B (en) Personalized search system based on interaction matching
Sadr et al. Convolutional neural network equipped with attention mechanism and transfer learning for enhancing performance of sentiment analysis
CN111178986B (en) User-commodity preference prediction method and system
Dinov et al. Black box machine-learning methods: Neural networks and support vector machines
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
Jiang et al. An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing
CN115270752A (en) Template sentence evaluation method based on multilevel comparison learning
CN113342950B (en) Answer selection method and system based on semantic association
Kuo et al. An application of differential evolution algorithm-based restricted Boltzmann machine to recommendation systems
Alcin et al. OMP-ELM: orthogonal matching pursuit-based extreme learning machine for regression
CN116910375A (en) Cross-domain recommendation method and system based on user preference diversity
Ganguly et al. Evaluating CNN architectures using attention mechanisms: Convolutional Block Attention Module, Squeeze, and Excitation for image classification on CIFAR10 dataset
Krishnan et al. Optimization assisted convolutional neural network for sentiment analysis with weighted holoentropy-based features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant