CN112069399A - Personalized search system based on interactive matching - Google Patents
Personalized search system based on interactive matching Download PDFInfo
- Publication number
- CN112069399A CN112069399A CN202010861245.9A CN202010861245A CN112069399A CN 112069399 A CN112069399 A CN 112069399A CN 202010861245 A CN202010861245 A CN 202010861245A CN 112069399 A CN112069399 A CN 112069399A
- Authority
- CN
- China
- Prior art keywords
- matching
- vector
- document
- user
- personalized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention realizes an interactive matching-based personalized search system through a method in the field of artificial intelligence, and comprises a system input module, an interactive matching-based personalized search module and an output module, wherein the operation process of the interactive matching-based personalized search module comprises four steps of bottom layer matching modeling of user search history, calculation of attention weight, generation of user interest matching vectors and personalized reordering, the model idea of matching the history query of a user and candidate documents is interacted at a word level, the idea of reducing the influence of irrelevant information in the search history is focused, a convolutional neural network is used for fusing weighted matching methods, so that the final interest matching vectors of the documents are generated, more accurate interest matching scores are obtained, and the problem that the quality of sequencing results depends on the vector construction model under the existing vector representation-based method is solved, and the process of constructing the vector may omit some useful information.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to an interactive matching-based personalized search system.
Background
Personalizing user searches with the user's historical information has proven to be effective in improving the quality of search rankings. The personalized search algorithm firstly models the interests of the user according to the information such as the historical behaviors of the user, not only considers the relevance of the query statement and the document, but also introduces the matching degree between the document and the interests of the user when calculating the matching score, thereby customizing a search result list which meets the requirements of different users in a personalized manner. The user interest model can be established based on various information sources, such as the position information of the user, the retrieval mode, the browsing history and the search history of the user, and the like. In recent years, researchers introduce a deep learning method into a personalized ranking model, so that the semantic understanding capability of the model to texts is enhanced, and a good effect on personalized rearrangement of search results is achieved. Ranking algorithms using deep learning can be classified into representation-based matching and interaction-based matching. The expression matching is based on that in a sorting algorithm, semantic vector expressions of a query and a document are obtained by learning respectively, and then the two vectors are subjected to matching calculation, and the core of the algorithm is to learn the semantic vector expressions. The algorithm based on interactive matching is to interact the query and the document in advance at a finer-grained word level, grab more complete matching signals, and fuse the matching signals into a matching score, and the core of the algorithm is how to process the matching signals and fuse the matching signals into a matching score. The existing personalized search algorithm almost calculates the interest expression vector of a user firstly, then interacts with the expression vector of a candidate document to obtain a personalized matching score, and an algorithm idea based on expression matching is used.
Most of the existing personalized ranking algorithms directly calculate interest expression vectors of users in various ways according to historical behaviors of the users, and then interact with the expression vectors of candidate documents to obtain personalized matching scores. The method of the type is to obtain a matching signal of the document and the user interest by taking the whole document as a unit, and is mainly characterized in that the document to be matched and the user interest are converted into a representation vector, then vector matching is carried out, and the construction of a representation layer is emphasized. Under the vector representation-based method, the quality of the ranking result depends on the quality of a vector construction model, and the vector construction process may omit some useful information, such as text information and interaction information of queries and documents at a word level, thereby affecting the personalized ranking result.
Disclosure of Invention
Therefore, the invention provides an interactive matching-based personalized search system, which comprises an input module, an interactive matching-based personalized search module and an output module;
the input module is used for reading the user query history and the alternative documents, standardizing the formats of the documents and inputting the documents into the personalized search module based on the interactive matching,
the operation process of the personalized search module based on interactive matching is divided into four steps:
the method comprises the following steps: the method comprises the following steps of performing bottom matching modeling on user search history, namely establishing a bottom matching model by utilizing historical search information of a user, and interacting historical query of the user and candidate documents according to words to obtain a detailed bottom matching signal;
step two: the calculation step of the attention weight value, introduce the attention mechanism, according to the degree of contribution of different inquiry records in the user search history to the current inquiry, carry on the weighting processing to its corresponding matching signal;
step three: a step of generating a user interest matching vector, which is to extract the characteristics of the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating the relevance scores of the candidate documents through clicking the feature vectors, and performing personalized reordering by taking the sum of the two as a final document matching score;
and the output module outputs the document matching distribution and the personalized rearrangement result.
The specific implementation manner of the bottom layer matching modeling step of the user search history is as follows: defining a history of a userThe query list is { q1,q2,q3,…,qnH (where n is 3 or more and is an integer), the current candidate document is d, and for each historical query-candidate document pair<qi,d>Firstly, mapping the two words into word vectors word by word, using word2vec model to express the word vectors, qiAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwy}. Each vector in the two groups of word vectors is interacted pairwise to obtain<qi,d>The word matching matrix T of (a), each element in the matching matrix T being:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjAnd representing word vectors corresponding to jth words in the candidate documents (wherein i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y, and i, j, x and y are integers), and calculating the matching values of the i, the j, the x and the y by a cosine function. In the K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vectorThe corresponding formula of the RBF kernel is as follows:
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAll the parameters are hyper-parameters, mu is uniformly valued from-1 to 1, and then logarithms of characteristic vectors corresponding to each row in the matching matrix are taken and summed to serve as historical queries qiFinal underlying matching results with candidate documents:
for the underlying match vector calculated based on the user's historical search information, { v }1,v2,v3,…,vnAnd expressing the element of the fine-grained matching vector v of the candidate document.
The specific implementation manner of the calculation step of the attention weight value is as follows: and calculating an attention weight value for the bottom layer matching vector corresponding to each historical query record by using the fine-grained matching vector v of the current query q and the candidate document d:
ei=g(v,vi)
wherein g is a multilayer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value, the weighted bottom layer matching vector is:
the weighted fine-grained matching vector corresponding to each historical query of the user is { V }1,V2,V3,…,Vn}。
The specific implementation manner of the step of generating the user interest matching vector is as follows: matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×nAnd using 100 convolution cores to carry out convolution on the matched feature matrix M to obtain a three-dimensional tensor A belonging to the R100×(K-2)×(n-2)Each element in tensor a is:
wherein t is an integer of 1-100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,the method comprises the steps of multiplying elements at corresponding positions of two matrixes, adding and summing all the products, adopting a Relu function as an activation function by a convolutional layer, processing the convolutional layer, and applying maximal pooling to the second dimension and the third dimension of a three-dimensional tensor A by a pooling layer to obtain a 100-dimensional vector I, ItFor the t-th element in vector I:
the output vector I is the final user interest matching vector.
The size of the convolution kernel is 3 x 3, and there are at least 3 pieces in the search history of each user.
The specific implementation manner of the personalized reordering step is as follows: the matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer sensing computer according to click times, original click positions and click entropies; and adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
In the calculation of the relevance scores of the candidate documents and the current query, the training is carried out through a Lambdarank algorithm, the clicked document is taken as a relevant document sample, the rest documents are taken as irrelevant samples,selecting a related document diAnd an irrelevant document djDocument pairs are constructed to calculate losses. The calculation of the loss function also introduces the influence degree of the sequence of the exchanged document pairs on the evaluation index MAP, and the sequence is used as a corresponding weight value, namely, the document pairs with larger difference (the MAP change value after the exchange sequence is large) are endowed with larger weight values. The loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability with the change value of the MAP evaluation index:
wherein Δ is a document diAnd document djThe change value of the MAP evaluation index after the exchange position,representing a document diDocument djActual probability of high correlation, pijRepresenting the prediction probability, prediction probability pijThe calculation method comprises the following steps:
the technical effects to be realized by the invention are as follows:
(1) a model idea based on interactive matching is introduced, a text is not converted into a unique integral expression vector, and the historical query of a user is interacted with a candidate document at a word level to obtain a more accurate and complete matching signal.
(2) An attention mechanism is introduced, and corresponding matching signals are weighted according to the contribution degree of different historical queries to current matching, so that the influence of irrelevant information in search history is reduced.
(3) Feature extraction is carried out on the weighted matching signals by using a convolutional neural network, and a final interest matching vector of the document is generated, so that more accurate interest matching scores are obtained.
Drawings
FIG. 1 is a framework of a personalized search module based on interactive matching;
Detailed Description
The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.
In order to achieve the above object, the present invention provides a personalized search system based on interactive matching.
The system comprises an input module, an individualized searching module based on interactive matching and an output module; the input module is used for reading user query history and alternative documents, standardizing formats of the documents and inputting the documents into the personalized search module based on interactive matching, and the output module outputs the document matching scores and personalized rearrangement results.
And the personalized search module based on interactive matching processes the bottom layer matching signals by using a convolutional neural network to obtain a final interest matching result of the candidate document.
The personalized search module based on interactive matching considers the historical query in the user historical behavior information and the matching signal between words of the candidate document, and a historical query list { q ] of the user1,q2,q3,…,qnD is the current candidate document, firstly, a search log of a user is processed through a K-NRM model based on interactive matching, and each historical query q is obtainediAnd fine-grained matching vector v of candidate document di(where 1 ≦ i ≦ n), and the fine-grained match vector v for the current query q and candidate document d. Then, considering that the user interests are dynamically changing and the user query sometimes has a certain contingency, the contribution of different queries in the user search history to the current query is different. According to the contribution degree of each historical query to the current query, a matching vector { v ] generated by a multi-layer perceptron on the K-NRM model1,v2,v3,…,vnWeighting to obtain a weighted matching vector list (V)1,V2,V3,…,Vn}. Then, convolutional nerves are utilizedThe network processes these vectors to derive matching vectors between the candidate documents and the user interests. And finally, respectively calculating the interest matching score and the relevancy score of the current candidate document according to the interest matching vector and the click feature vector, and adding to obtain a final document matching score, wherein the formula is as follows:
score(d)=score(d|I)+score(d|q)
where score (d | I) represents the matching score of the current candidate document to the user search interests and score (d | q) represents the relevance score of the current candidate document to the current query.
The framework of the personalized search module based on interactive matching is shown in fig. 1, and is divided into the following four parts according to the processing flow:
the method comprises the following steps: the bottom level of the user search history is modeled as a match. And establishing a bottom layer matching model by using the historical search information of the user, and interacting the historical query of the user and the candidate document according to words to obtain a detailed matching signal of the bottom layer.
Step two: and (4) calculating the attention weight value. And (4) introducing an attention mechanism, and weighting the corresponding matching signals according to the contribution degree of different query records in the user search history to the current query.
Step three: and generating a user interest matching vector. And performing feature extraction on the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest.
Step four: and (5) personalized reordering. And calculating the personalized score of the candidate document according to the obtained interest matching vector, calculating the relevance score of the candidate document by clicking the feature vector, and performing personalized rearrangement by taking the sum of the two as the final document matching score.
Bottom matching modeling step of user search history:
the search history of the user can provide rich information for obtaining the search interest of the user. Most of the existing algorithms model the interest of the user based on the historical behavior information of the user to obtain an interest vector representing the search preference of the user, and then carry out interactive processing with the document vector. A K-NRM framework is adopted, for each user U, a bottom layer matching model is established by using historical search information of the user U, and each historical query in the user historical search is interactively matched with a candidate document at the bottom layer.
The user's historical query list is q1,q2,q3,…,qnAnd d is the current candidate document. For each historical query-candidate document pair<qi,d>Firstly, the two words are mapped into word vectors word by word, and the word vectors are expressed by using a word2vec model. q. q.siAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwy}. Each vector in the two groups of word vectors is interacted pairwise to obtain<qi,d>Matches the matrix T. Each element in the matching matrix T is given by the following formula:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjAnd representing a word vector corresponding to the jth word in the candidate document (wherein i is more than or equal to 1 and less than or equal to x, and j is more than or equal to 1 and less than or equal to y), and calculating the matching value of the two words by a cosine function.
As can be seen from the above description, the ith row in the matching matrix represents the signal of matching the ith word in the historical query with the candidate document. In the K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vectorThe corresponding formula of the RBF kernel is as follows:
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAre all hyper-parameters. In the K-NRM model used by the user, the cosine similarity value of the vector is between-1 and 1, so mu is uniformly taken from-1 to 1. Then, logarithm of characteristic vector corresponding to each row in the matching matrix is taken and then summed to serve as historical query qiThe final underlying matching results with the candidate documents are as follows:
for each historical query qiIt has a K-dimensional matching vector with the current candidate document, and the matching vector is the historical query qiAnd fine-grained matching vector v of candidate document di. And calculating a fine-grained matching vector v of the current query q and the candidate document d by the process. To this end, we have obtained the underlying match vector calculated based on the user's historical search information, using { v }1,v2,v3,…,vnRepresents it.
The calculation step of the attention weight value:
because the search interest and the search mode of the user are dynamically changed and the user query has a certain contingency, the influence degree of different query records in the search history of the user on the current query is different. Based on the consideration, the step introduces an attention mechanism, and further optimizes each bottom layer matching vector according to the contribution degree of different historical queries to the current matching.
In the previous step, we obtain the underlying match vector { v } calculated using the user's historical search information1,v2,v3,…,vn}. In the step, based on the fine-grained matching vectors v of the current query q and the candidate document d, the attention weight value is calculated for the bottom layer matching vector corresponding to each historical query record. The input of the attention layer is the bottom layer matching vector v calculated in the previous step1,v2,v3,…,vnAnd v, the calculation formula is as follows:
ei=g(v,vi)
wherein g (-) is a multi-layer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value. The weighted bottom-level matching vector is given by the following formula:
and the attention layer gives more attention to the bottom layer matching vector corresponding to the history query with larger contribution according to the information quantity of the current matching contribution of different history queries in the user search history, and obtains optimized bottom layer matching information weighted according to the contribution degree. At this point, we obtain a weighted fine-grained matching vector { V) corresponding to each historical query of the user1,V2,V3,…,Vn}。
Generating a user interest matching vector:
matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×n. The traditional approach is to apply maximum pooling or average pooling directly on the matching feature matrix to obtain the user interest matching vector. However, given the potentially large number of historical search records in a user's search history, applying pooling directly on the matching feature matrix may omit some useful information, such as relationship information between the underlying matching vectors corresponding to adjacent historical queries.
To compensate for this deficiency, this step uses 100 convolution kernels f of 3 × 31,f2,…,f100Convolving the matched feature matrix M to obtain a three-dimensional tensor A epsilon R100×(K-2)×(n-2). Each element in tensor a is disclosed byThe formula is given as:
wherein t is more than or equal to 1 and less than or equal to 100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,which represents an operation of multiplying elements at corresponding positions of two matrices and summing up all the products. The convolution layer of this step uses a convolution kernel of 3 × 3, which requires at least 3 historical query records in the search history of each user. In other words, the model does not support users with less than three historical query records, because too few historical query records cannot provide enough information for extracting the search interest of the user, and in this case, personalized rearrangement of the documents can interfere with accurate calculation of the document scores. In addition, the convolution layer uses the Relu function as an activation function, and the Relu function has smaller calculation amount compared with other activation functions such as sigmoid and the like, and can avoid the problem of gradient disappearance.
After convolutional layer processing, we apply max-Pooling to the second and third dimensions of the three-dimensional tensor A at the pooling layer to obtain a 100-dimensional vector I. I istFor the t-th element in the vector I, the calculation formula is as follows:
the purpose of the pooling layer is to perform further feature extraction on the matched feature tensor A, and the output vector I is the final user interest matching vector.
Personalized reordering step
Since the score of a candidate document consists of two parts: a matching score of the candidate documents to the user interests and a relevance score to the current query. The matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; and the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer perception computer according to three click features, namely the click times, the original click position and the click entropy. And adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
The Lambdannk algorithm is selected for training, the clicked document is used as a related document sample, other documents are used as unrelated samples, and a related document d is selectediAnd an irrelevant document djDocument pairs are constructed to calculate losses. The loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability by the change value of the MAP evaluation index, and the calculation formula is as follows:
wherein, Delta is the variation value of the MAP evaluation index,representing a document diDocument djActual probability of high correlation, pijRepresenting its prediction probability;representing a document djDocument diActual probability of high correlation, pjiRepresenting its prediction probability. Prediction probability pijCalculated by the following formula:
and finally outputting the obtained personalized sorting result to an output module for external output.
Claims (7)
1. An interactive matching-based personalized search system, which is characterized in that: the system comprises an input module, an individualized searching module based on interactive matching and an output module;
the input module is used for reading the user query history and the alternative documents, standardizing the formats of the documents and inputting the documents into the personalized search module based on the interactive matching,
the operation process of the personalized search module based on interactive matching is divided into four steps:
the method comprises the following steps: the method comprises the following steps of performing bottom matching modeling on user search history, namely establishing a bottom matching model by utilizing historical search information of a user, and interacting historical query of the user and candidate documents according to words to obtain a detailed bottom matching signal;
step two: the calculation step of the attention weight value, introduce the attention mechanism, according to the degree of contribution of different inquiry records in the user search history to the current inquiry, carry on the weighting processing to its corresponding matching signal;
step three: a step of generating a user interest matching vector, which is to extract the characteristics of the weighted matching signals by using a convolutional neural network to generate a final matching vector of the document and the user interest;
step four: a personalized reordering step, namely calculating personalized scores of candidate documents through the user interest matching vectors obtained in the user interest matching vector generation step, calculating the relevance scores of the candidate documents through clicking the feature vectors, and performing personalized reordering by taking the sum of the two as a final document matching score;
and the output module outputs the document matching distribution and the personalized rearrangement result.
2. The personalized search system based on interactive matching as claimed in claim 1, wherein: the specific implementation manner of the bottom layer matching modeling step of the user search history is as follows: defining a user's historical query list as q1,q2,q3,…,qnWherein n is an integer of n.gtoreq.3The current candidate document is d, and for each historical query-candidate document pair<qi,d>Firstly, mapping the two words into word vectors word by word, using word2vec model to express the word vectors, qiAfter processing, the words are expressed as a group of word vectors qw1,qw2,qw3,…,qwxD is processed and expressed as { dw }1,dw2,dw3,…,dwyAnd (4) interacting every two of the two groups of word vectors to obtain<qi,d>The word matching matrix T of (a), each element in the matching matrix T being:
Ti,j=cos(qwi,dwj)
wherein T isi,jRepresents the element of the ith row and jth column of the matrix T, qwiRepresents the word vector corresponding to the ith word in the historical query, dwjRepresenting word vectors corresponding to jth words in the candidate documents, wherein i is more than or equal to 1 and less than or equal to x, j is more than or equal to 1 and less than or equal to y, i, j, x and y are integers, the matching values of the i, the j, the x and the y are calculated by a cosine function, and in a K-NRM model, K RBF kernels are applied to each row in a matching matrix to obtain a K-dimensional feature vectorThe corresponding formula of the RBF kernel is as follows:
wherein, Kk(Ti) Representing the value of the processed ith row of the matching matrix T by the kth RBF kernel, wherein the value range is between 0 and y; mu.skAnd σkAll the parameters are hyper-parameters, mu is uniformly valued from-1 to 1, and then logarithms of characteristic vectors corresponding to each row in the matching matrix are taken and summed to serve as historical queries qiFinal underlying matching results with candidate documents:
for the underlying match vector calculated based on the user's historical search information, { v }1,v2,v3,…,vnAnd expressing the element of the fine-grained matching vector v of the candidate document.
3. The personalized search system based on interactive matching as claimed in claim 2, wherein: the specific implementation manner of the calculation step of the attention weight value is as follows: and calculating an attention weight value for the bottom layer matching vector corresponding to each historical query record by using the fine-grained matching vector v of the current query q and the candidate document d:
ei=g(v,vi)
wherein g is a multilayer perceptron with tanh as the activation function, αiIs the bottom layer matching vector v calculated by the attention layeriThe corresponding weight value, the weighted bottom layer matching vector is:
the weighted fine-grained matching vector corresponding to each historical query of the user is { V }1,V2,V3,…,Vn}。
4. The personalized search system based on interactive matching as claimed in claim 3, wherein: the specific implementation manner of the step of generating the user interest matching vector is as follows: matching the weighted fine-grained matching vector V1,V2,V3,…,VnSplicing the characters into a matched feature matrix M according to columns, wherein M is [ V ═ V }1,V2,V3,…,Vn]∈RK×nTo makeAnd (3) performing convolution on the matched feature matrix M by using 100 convolution cores to obtain a three-dimensional tensor A belonging to R100×(K-2)×(n-2)Each element in tensor a is:
wherein t is an integer of 1-100, btFor the bias vector b ∈ R100Of (1) t-th element value, ftFor the t-th 3 × 3 convolution kernel, Mi-1:i+1,j-1:j+1Represents a sub-matrix of the value of the matching characteristic matrix M from the i-1 th row to the i +1 th row and from the j-1 st column to the j +1 st column,the method comprises the steps of multiplying elements at corresponding positions of two matrixes, adding and summing all the products, adopting a Relu function as an activation function by a convolutional layer, processing the convolutional layer, and applying maximal pooling to the second dimension and the third dimension of a three-dimensional tensor A by a pooling layer to obtain a 100-dimensional vector I, ItFor the t-th element in vector I:
the output vector I is the final user interest matching vector.
5. The personalized search system based on interactive matching of claim 4, wherein: the size of the convolution kernel is 3 x 3, and there are at least 3 pieces in the search history of each user.
6. The personalized search system based on interactive matching of claim 5, wherein: the specific implementation manner of the personalized reordering step is as follows: the matching score (d | I) of the candidate document and the user interest is obtained by training an interest matching vector I through a multi-layer perceptron; the relevance score (d | q) of the candidate document and the current query is obtained through calculation of a multi-layer sensing computer according to click times, original click positions and click entropies; and adding the interest matching score (d | I) and the correlation score (d | q) to obtain a final score of the candidate document, and reordering the original document list according to the score to obtain a final personalized sorting result.
7. The personalized search system based on interactive matching of claim 6, wherein: in the calculation of the relevance scores of the candidate documents and the current query, training is carried out through a Lambdarank algorithm, a clicked document is used as a relevant document sample, other documents are used as irrelevant samples, and a relevant document d is selectediAnd an irrelevant document djThe loss function is calculated by forming a document pair, the influence degree of the sequence of the exchange document pair on the evaluation index MAP is also introduced into the calculation of the loss function, the influence degree is used as a corresponding weight value, namely the greater the change value of the MAP after the exchange sequence is, the greater the document difference is, the greater the weight value is given to the exchange sequence, and the loss function is obtained by multiplying the cross entropy between the actual probability and the prediction probability and the change value of the MAP evaluation index:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010861245.9A CN112069399B (en) | 2020-08-25 | 2020-08-25 | Personalized search system based on interaction matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010861245.9A CN112069399B (en) | 2020-08-25 | 2020-08-25 | Personalized search system based on interaction matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112069399A true CN112069399A (en) | 2020-12-11 |
CN112069399B CN112069399B (en) | 2023-06-02 |
Family
ID=73658899
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010861245.9A Active CN112069399B (en) | 2020-08-25 | 2020-08-25 | Personalized search system based on interaction matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112069399B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987155A (en) * | 2021-11-25 | 2022-01-28 | 中国人民大学 | Session type retrieval method integrating knowledge graph and large-scale user logs |
CN114357231A (en) * | 2022-03-09 | 2022-04-15 | 城云科技(中国)有限公司 | Text-based image retrieval method and device and readable storage medium |
CN117851444A (en) * | 2024-03-07 | 2024-04-09 | 北京谷器数据科技有限公司 | Advanced searching method based on semantic understanding |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107291871A (en) * | 2017-06-15 | 2017-10-24 | 北京百度网讯科技有限公司 | Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence |
CN107957993A (en) * | 2017-12-13 | 2018-04-24 | 北京邮电大学 | The computational methods and device of english sentence similarity |
US20180349477A1 (en) * | 2017-06-06 | 2018-12-06 | Facebook, Inc. | Tensor-Based Deep Relevance Model for Search on Online Social Networks |
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN111125538A (en) * | 2019-12-31 | 2020-05-08 | 中国人民大学 | Searching method for enhancing personalized retrieval effect by using entity information |
CN111177357A (en) * | 2019-12-31 | 2020-05-19 | 中国人民大学 | Memory neural network-based conversational information retrieval method |
CN111310023A (en) * | 2020-01-15 | 2020-06-19 | 中国人民大学 | Personalized search method and system based on memory network |
-
2020
- 2020-08-25 CN CN202010861245.9A patent/CN112069399B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180349477A1 (en) * | 2017-06-06 | 2018-12-06 | Facebook, Inc. | Tensor-Based Deep Relevance Model for Search on Online Social Networks |
CN107291871A (en) * | 2017-06-15 | 2017-10-24 | 北京百度网讯科技有限公司 | Matching degree appraisal procedure, equipment and the medium of many domain informations based on artificial intelligence |
US20190114511A1 (en) * | 2017-10-16 | 2019-04-18 | Illumina, Inc. | Deep Learning-Based Techniques for Training Deep Convolutional Neural Networks |
CN107957993A (en) * | 2017-12-13 | 2018-04-24 | 北京邮电大学 | The computational methods and device of english sentence similarity |
CN111125538A (en) * | 2019-12-31 | 2020-05-08 | 中国人民大学 | Searching method for enhancing personalized retrieval effect by using entity information |
CN111177357A (en) * | 2019-12-31 | 2020-05-19 | 中国人民大学 | Memory neural network-based conversational information retrieval method |
CN111310023A (en) * | 2020-01-15 | 2020-06-19 | 中国人民大学 | Personalized search method and system based on memory network |
Non-Patent Citations (2)
Title |
---|
CHENYAN XIONG ET AL.: "End-to-End Neural Ad-hoc Ranking with Kernel Pooling", 《RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》 * |
周雨佳 等: "基于递归神经网络与注意力机制的动态个性化搜索算法", 《计算机学报》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987155A (en) * | 2021-11-25 | 2022-01-28 | 中国人民大学 | Session type retrieval method integrating knowledge graph and large-scale user logs |
CN113987155B (en) * | 2021-11-25 | 2024-03-26 | 中国人民大学 | Conversational retrieval method integrating knowledge graph and large-scale user log |
CN114357231A (en) * | 2022-03-09 | 2022-04-15 | 城云科技(中国)有限公司 | Text-based image retrieval method and device and readable storage medium |
CN114357231B (en) * | 2022-03-09 | 2022-06-28 | 城云科技(中国)有限公司 | Text-based image retrieval method and device and readable storage medium |
CN117851444A (en) * | 2024-03-07 | 2024-04-09 | 北京谷器数据科技有限公司 | Advanced searching method based on semantic understanding |
CN117851444B (en) * | 2024-03-07 | 2024-06-04 | 北京谷器数据科技有限公司 | Advanced searching method based on semantic understanding |
Also Published As
Publication number | Publication date |
---|---|
CN112069399B (en) | 2023-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299396B (en) | Convolutional neural network collaborative filtering recommendation method and system fusing attention model | |
CN110188358B (en) | Training method and device for natural language processing model | |
CN110717098B (en) | Meta-path-based context-aware user modeling method and sequence recommendation method | |
CN110929164A (en) | Interest point recommendation method based on user dynamic preference and attention mechanism | |
CN110516160A (en) | User modeling method, the sequence of recommendation method of knowledge based map | |
Li et al. | Heuristic rank selection with progressively searching tensor ring network | |
CN112069399B (en) | Personalized search system based on interaction matching | |
CN112884551B (en) | Commodity recommendation method based on neighbor users and comment information | |
CN111737578B (en) | Recommendation method and system | |
CN112328900A (en) | Deep learning recommendation method integrating scoring matrix and comment text | |
CN110232122A (en) | A kind of Chinese Question Classification method based on text error correction and neural network | |
Gad et al. | A robust deep learning model for missing value imputation in big NCDC dataset | |
CN112527993B (en) | Cross-media hierarchical deep video question-answer reasoning framework | |
CN115422369B (en) | Knowledge graph completion method and device based on improved TextRank | |
CN112115371A (en) | Neural attention mechanism mobile phone application recommendation model based on factorization machine | |
Jiang et al. | An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing | |
CN117648469A (en) | Cross double-tower structure answer selection method based on contrast learning | |
CN117494815A (en) | File-oriented credible large language model training and reasoning method and device | |
CN116976505A (en) | Click rate prediction method of decoupling attention network based on information sharing | |
CN116910375A (en) | Cross-domain recommendation method and system based on user preference diversity | |
CN116228368A (en) | Advertisement click rate prediction method based on deep multi-behavior network | |
Sarang | Thinking Data Science: A Data Science Practitioner’s Guide | |
CN117194771B (en) | Dynamic knowledge graph service recommendation method for graph model characterization learning | |
Alcin et al. | OMP-ELM: orthogonal matching pursuit-based extreme learning machine for regression | |
Lima et al. | A grammar-based GP approach applied to the design of deep neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |