WO2018157625A1 - 基于强化学习的排序学习方法及服务器 - Google Patents
基于强化学习的排序学习方法及服务器 Download PDFInfo
- Publication number
- WO2018157625A1 WO2018157625A1 PCT/CN2017/111319 CN2017111319W WO2018157625A1 WO 2018157625 A1 WO2018157625 A1 WO 2018157625A1 CN 2017111319 W CN2017111319 W CN 2017111319W WO 2018157625 A1 WO2018157625 A1 WO 2018157625A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sorting
- server
- documents
- query word
- target document
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to the field of sorting learning, and in particular to a sorting learning method and server based on reinforcement learning.
- Sorting learning algorithm is a very important machine learning algorithm.
- Sorting learning algorithm is a kind of sorting algorithm based on supervised learning, which has been widely used in search, question and answer and recommendation.
- the existing sorting algorithms mainly include: a pointwise algorithm, a pairwise algorithm, and a listwise algorithm.
- the Pointwise algorithm converts the sorting problem into a regression problem. For each "query word-document”, the learning sorting model is used to fit the score with the correlation label; the Pairwise algorithm converts the sorting problem into a classification problem, for each "Query word”, learning the sorting model makes it possible to distinguish the correlation between different "candidate documents" (determined by the label); Listwise algorithm is for each "query word”, hope to learn the sorting model to make the whole of the query The sorting effect is optimal.
- the existing model based on the sorting learning algorithm needs to rely on the correlation annotation data between the query word and the document for training, but cannot use the data obtained by the user to evaluate the sorting effect of the sorted list of the document corresponding to the query word, and cannot Improve user satisfaction with sorting results.
- the embodiment of the invention provides a sorting learning method and a server based on reinforcement learning, which is beneficial to improving the satisfaction of the user on the sorting result of the sorted list of documents corresponding to the query words.
- an embodiment of the present invention provides a method for ranking learning based on reinforcement learning, including:
- a receiving module configured to receive a query word input by a user
- a first obtaining module configured to acquire N documents that match the query word; wherein the N is a natural number;
- a first sorting module configured to sort the N documents by using a sorting model to obtain a sorted list of documents; wherein the sorting model is based on a reinforcement learning algorithm, a historical query word, and a history corresponding to the historical query word The document, the sorted list of documents corresponding to the historical query word, and the evaluation result of the sorting effect are trained;
- a display module configured to present the target document sorted list to the user.
- the above-mentioned sorting model is continuously trained by the reinforcement learning algorithm, and the sorting effect of the sorted list of documents obtained by the sorting model is improved, thereby improving the user's satisfaction with the sorting effect.
- the method before the server sorts the N documents by using a sorting model to obtain a sorted list of documents, the method includes:
- the server sorts the N documents by using a sorting model to obtain a sorted list of documents, the party The law also includes:
- the server sorts the M documents to obtain a target document sorted list
- the server uses the historical query word, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample, and puts into a training sample set;
- the server trains the training sample set to obtain the ranking model by using a reinforcement learning algorithm.
- the historical query word, the M documents, the target document sorting list, and the sorting effect evaluation value are used as a training sample of a training sample set, and the training sample set and reinforcement learning are adopted.
- the algorithm continuously optimizes the parameter ⁇ , so that the value of the expected function is continuously increased, and the ordering index is accurately optimized, which is beneficial to improve the user's satisfaction with the sorting result of the sorted list of documents corresponding to the query word.
- the server sorts the M documents according to a sorting model to obtain a sorted list of target documents, including:
- the server scores the relevance of the query words in each of the M documents according to the sorting model to obtain a scoring result
- the server sorts the n documents according to the ascending or descending order of the scoring results to obtain the target document sorted list.
- the server obtains a ranking effect evaluation value of the target document sorted list, including:
- the server evaluates the sorting effect of the sorted list of the target document according to the user behavior, and obtains an evaluation value of the sorting effect.
- the server obtains a ranking effect evaluation value of the target document sorted list, and further includes:
- the server uses the value given by the user to evaluate the sorting effect of the sorted list of the target document as the sorting effect evaluation value.
- the server obtains a ranking effect evaluation value of the target document sorted list, and further includes:
- the server evaluates the sorting effect of the target document sorting list according to the result of the user scoring the relevance of each document in the target document sorting list to the query word to obtain the sorting effect evaluation value.
- an embodiment of the present invention provides a server, including:
- a receiving module configured to receive a query word input by a user
- a first obtaining module configured to acquire N documents that match the query word; wherein the N is a natural number;
- a first sorting module configured to sort the N documents by using a sorting model to obtain a sorted list of documents; wherein the sorting model is based on a reinforcement learning algorithm, a historical query word, and a history corresponding to the historical query word The document, the sorted list of documents corresponding to the historical query word, and the evaluation result of the sorting effect are trained;
- a display module configured to present the sorted list of the documents to the user.
- the N sorting modules are used to sort the N documents by using a sorting model.
- the server further includes:
- a second obtaining module configured to acquire a historical query word, and obtain M documents corresponding to the historical query word
- a second sorting module configured to sort the M documents to obtain a sorted list of target documents
- a third obtaining module configured to obtain a sorting effect evaluation value of the target document sorting list
- a collecting module configured to use the historical query word, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample, and put into a training sample set;
- a training module configured to: when the number of training samples in the training sample set is greater than a preset number, train the training sample set by using a reinforcement learning algorithm to obtain the ranking model.
- the second ranking module includes:
- a scoring unit configured to score the relevance of the query words in each of the M documents according to the sorting model to obtain a scoring result
- a sorting unit configured to sort the M documents according to the ascending or descending order of the scoring results to obtain a document sorting list.
- the third obtaining module is specifically configured to:
- the sorting effect of the sorted list of the target document is evaluated according to the user behavior, and the evaluation value of the sorting effect is obtained.
- the third obtaining module is specifically configured to:
- the third obtaining module is configured to obtain a value given by the user for evaluating the sorting effect of the sorted list of the target document as the sorting effect evaluation value.
- the third obtaining module is specifically configured to:
- FIG. 1 is a schematic flowchart of a method for searching a document according to an embodiment of the present invention
- FIG. 2 is a schematic flowchart diagram of a sorting learning method based on reinforcement learning according to an embodiment of the present invention
- FIG. 3 is a schematic flowchart of a learning method based on reinforcement learning according to an embodiment of the present invention
- FIG. 4 is a schematic flowchart of a learning method based on reinforcement learning according to an embodiment of the present invention.
- FIG. 5 is a schematic flowchart of a learning method based on reinforcement learning according to an embodiment of the present invention.
- FIG. 6 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
- FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
- FIG. 8 is a schematic structural diagram of another server according to an embodiment of the present invention.
- references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the invention.
- the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will understand and implicitly understand that the embodiments described herein can be combined with other embodiments.
- FIG. 1 is a schematic flowchart of a method for searching a document according to an embodiment of the present invention.
- an enhanced learning based ranking learning method provided by an embodiment of the present invention includes the following steps:
- the server receives a query word input by a user.
- the server acquires N documents that match the query word; wherein the N is a natural number.
- the server receives the query words input by the user, and acquires N documents related to the query word in its background database.
- the above query word is included in the title or content of any one of the above N documents.
- the server sorts the N documents by using a sorting model to obtain a sorted list of documents.
- the sorting model is based on a reinforcement learning algorithm, a historical query word, and a history document corresponding to the historical query word.
- the document sorting list and the sorting effect evaluation value corresponding to the historical query word are trained.
- the foregoing server uses the sorting model to sort the N documents to obtain a sorted list of documents, and specifically includes:
- the server scores each of the N documents according to the sorting model, and obtains a score result
- the server sorts the N documents according to the score result to obtain the sorted list of the documents.
- the above sorting model is a differentiable function with parameters.
- the forward neural network function MLP(x) is a differentiable function with parameters.
- the foregoing server trains the foregoing sorting model process according to the reinforcement learning algorithm, the historical query word, the historical document corresponding to the historical query word, the document sorting list corresponding to the historical query word, and the sorting effect evaluation value. See the related description of Figures 2-5.
- the server presents the sorted list of the documents to the user.
- the list of the documents is displayed for the user to consult.
- FIG. 2 is a schematic flowchart diagram of a learning method based on reinforcement learning according to an embodiment of the present invention.
- a method for learning based on reinforcement learning provided by an embodiment of the present invention includes the following steps:
- the server acquires a historical query word, and acquires M documents corresponding to the historical query word.
- the historical query word may be input by a user, or may be automatically obtained by the server.
- the server receives the historical query word q
- the M documents related to the historical query word q are obtained in the background database, and the title or content of any one of the M documents includes the historical query word q.
- the above M documents can be represented by a set (d 1 , d 2 , ..., d M ).
- the server sorts the M documents to obtain a target document sorted list.
- the server according to the foregoing sorting model scores the correlation between each of the M documents and the historical query word q to obtain a scoring result.
- the correlation of each of the above M documents with the above-mentioned historical query word q includes the number of times the above-mentioned historical query word q appears in the title or content of each of the above M documents.
- the server sorts the M documents according to the ascending or descending order of the scoring results to obtain a document sorting list.
- the above sorting model is a differentiable function with parameters, which can be represented by f(q, d; ⁇ ), q is the above query word, d is a document obtained according to the query word q, and ⁇ is a parameter.
- the above-mentioned server may collect the scoring results obtained by scoring the frequency of occurrence of the above-mentioned historical query words in each of the M documents according to the above-described sorting model f(q, d; ⁇ ) (d 1 ' , d 2 ' , ..., d M ') indicates that the above-mentioned server sorts the target document sorted list obtained by sorting the above M documents according to the ascending or descending order of the above scoring results (y 1 , y 2 ,. ., y M ).
- the sort function is a descending sorting model or an ascending sorting model.
- the server acquires a ranking effect evaluation value of the target document sorted list.
- the foregoing server obtains the ranking effect evaluation value of the foregoing target document sorting list, including:
- the server evaluates the sorting effect of the sorted list of the target documents according to the user behavior, and obtains the sorting effect evaluation value.
- the server acquires the target user In the last document ordering list (y 1, y 2, ising , y M) of the click position k, the click position k ⁇ ⁇ 1,2, Across . M ⁇ ; the above server obtains the sorting effect evaluation value r of the above-mentioned target document sorted list by the server according to the evaluation function.
- the foregoing server obtains the ranking effect evaluation value of the foregoing target document sorting list, including:
- the server obtains a numerical value given by the user for evaluating the sorting effect of the sorted list of the target documents, as the sorting effect evaluation value.
- the user evaluates the satisfaction degree of the sorting result of the document sorting list (y 1 , y 2 , . . . , y M ) corresponding to the query word q, and gives a numerical value.
- the above server uses the value as the sorting effect evaluation value r.
- the foregoing server obtains the ranking effect evaluation value of the foregoing target document sorting list, including:
- the server evaluates the sorting effect of the target document sorting list according to the result of the user scoring the correlation between each document in the target document sorting list and the historical query word q to obtain the sorting effect evaluation value.
- the user scores the correlation between each of the target document sorted lists (y 1 , y 2 , . . . , y M ) and the historical query word q, and gives a score result.
- the scoring result can be used as a set (g 1 , g 2 , ..., g M ).
- the above scoring result g i is the i-th value of the above set (g 1 , g 2 , ..., g M ), g i ⁇ (0, 1 , 2 , 3).
- the above server calculates the above sorting effect evaluation value r according to a preset formula
- g ⁇ - (i) can be understood as the i-th value of the above set (g 1 , g 2 , ..., g M ), that is, g i .
- Discounted Cumulative Gain is an indicator to measure the server algorithm.
- DCG@K is to calculate the DCG according to the first K results in the search result, and the value of K is related to the number of search results that the user pays attention to.
- the server is a web server
- the server is a question and answer system server
- the server uses the historical query word, the M documents, the target document list sorting, and the sorting effect evaluation value as a training sample, and puts into a training sample set.
- the above training samples can be represented by a set (q, d 1 , ... d M , ⁇ , r).
- q is the above query word
- d 1 , d 2 ... d M is the above M documents
- ⁇ is the above-mentioned target document sorted list (y 1 , y 2 , ..., y M )
- r is the above sort Effect evaluation value.
- the above training sample set available collection contains m of the above training samples.
- step S205 when the number of training samples in the training sample set reaches the preset number m, the server performs step S205.
- the server trains the training sample set by using a reinforcement learning algorithm to obtain the ranking model.
- the server uses the reinforcement learning algorithm to train the training sample set to obtain the ranking model
- the server may be regarded as a parameter ⁇ ′ that the server obtains the maximum value of the expected function according to the training sample set training and the reinforcement learning algorithm training solution.
- the parameter ⁇ ' is substituted for the parameter ⁇ in the sorting model to obtain a new sorting model. This process can be seen as the solved ⁇ .
- ⁇ is solved by the expectation function, which is the largest value of the expected function.
- the expectation function is among them, ⁇ (i) is a target document sorted list corresponding to the historical query word q in the i-th sample in the training sample set, and s (i) is the M documents corresponding to the historical query word q in the i-th sample in the training sample set , r (i) is the ranking effect evaluation value in the i-th sample in the above training sample set.
- s (i) ; ⁇ ) is the probability of obtaining the document list ⁇ (i) on the premise of the above M documents s (i) , and ⁇ is the function p( ⁇ (i)
- r (i) ⁇ r (j) represents that the sorting effect of the sorted list of the jth target document in the training sample set is better than that of the sorted list of the i-th target document,
- r (i) ⁇ r (j) is better for the sorting list of the jth target document in the above training sample set than the i-th target document effect sorting list;
- the sorting effect of the sorted list of target documents is better than that of the sorted list of the jth target documents.
- the above steps S201-S205 are the processes of training the above-mentioned sorting model described in the above step S103.
- the server acquires a historical query word, and acquires M documents corresponding to the historical query word; the server sorts the M documents to obtain a target a sorting list of documents; the server acquiring a sorting effect evaluation value of the sorted list of the target documents; the server using the historical query words, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample is placed in the training sample set; when the number of training samples in the training sample set is greater than a preset number, the server trains the training sample set by using a reinforcement learning algorithm to obtain the ranking model .
- the server acquires a parameter ⁇ ′ that takes the maximum value of the desired function according to the training sample set and the reinforcement learning algorithm, and replaces the parameter ⁇ ′ with the parameter ⁇ in the ranking model, and the maximum value of the expected function is represented.
- Sorting metrics are the most optimized. Compared with the prior art, the historical query word, the M documents, the target document sorting list, and the sorting effect evaluation value are used as a training sample of a training sample set, and the training sample set and reinforcement learning are adopted.
- the algorithm continuously optimizes the parameter ⁇ , so that the value of the expected function is continuously increased, and the ordering index is accurately optimized, which is beneficial to improve the user's satisfaction with the sorting result of the target document sorting list corresponding to the query word.
- FIG. 3 is a schematic flowchart of a learning method based on reinforcement learning according to an embodiment of the present invention, including the following steps:
- the server acquires the historical query word, and acquires M documents related to the historical query word.
- the server sorts the M documents to obtain a target document sorted list.
- the server acquires a ranking effect evaluation value of the target document sorting list.
- the server uses the historical query word, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample, and puts the counter of the server into the training sample set. 1.
- the above training samples can be represented by a set (q, d 1 , ... d M , ⁇ , r).
- q is the above query word
- d 1 , d 2 ... d M is the above M documents
- ⁇ is the above-mentioned target document sorted list (y 1 , y 2 , ..., y M )
- r is the above sort Effect evaluation value.
- the above training sample set available collection contains m of the above training samples.
- step S306 If the counter of the server is counted to the preset number m, the server executes step S306; otherwise, the server performs step S301.
- the preset threshold m is an integer greater than 1.
- the server trains the training sample set by using a reinforcement learning algorithm to obtain the ranking model.
- the above server resets its counter.
- the foregoing server uses the reinforcement learning algorithm to train the training sample set to obtain the ranking model, which can be regarded as the parameter ⁇ that the server takes the maximum value of the expected function according to the training sample set training and the reinforcement learning algorithm training solution. ', and replace the parameter ⁇ ' with the parameter ⁇ in the sorting model to obtain a new sorting model. This process can be seen as the solved ⁇
- ⁇ is solved by the expectation function, which is the largest value of the expected function.
- the expectation function is among them, ⁇ (i) is a target document sorted list corresponding to the historical query word q in the i-th sample in the training sample set, and s (i) is the M documents corresponding to the historical query word q in the i-th sample in the training sample set , r (i) is the ranking effect evaluation value in the i-th sample in the above training sample set.
- s (i) ; ⁇ ) is the probability of obtaining the document list ⁇ (i) on the premise of the above M documents s (i) , and ⁇ is the function p( ⁇ (i)
- r (i) ⁇ r (j) represents that the sorting effect of the sorted list of the jth target document in the training sample set is better than that of the sorted list of the i-th target document,
- step S305 After the server executes step S305, the above step S301 is performed again, and the counter starts counting.
- FIG. 4 is a schematic flowchart diagram of another sorting learning method based on reinforcement learning according to an embodiment of the present invention. As shown in FIG. 4, an embodiment of the present invention provides another method for learning based on reinforcement learning, including:
- the server acquires a historical query word q, and the historical query word q retrieves the background database to obtain M documents.
- the query word may be input by a user, or may be automatically obtained by the server.
- the title or content of any one of the M documents includes the historical query word q.
- the above M documents can be represented by a set (d 1 , d 2 , ..., d M ).
- the server sorts the M documents according to a sorting model to obtain a sorted list of documents.
- the server according to the foregoing sorting model scores the correlation between each of the M documents and the historical query word q to obtain a scoring result.
- the correlation of each of the above M documents with the above-mentioned historical query word q includes the number of times the above-mentioned historical query word q appears in the title or content of each of the above M documents.
- the server sorts the M documents according to the ascending or descending order of the scoring results to obtain a target document sorting list.
- the above sorting model is a differentiable function with parameters, which can be represented by f(q, d; ⁇ ), q is the above historical query word, d is a document obtained according to the historical query word q, and ⁇ is a parameter.
- the above-mentioned server may collect the scoring results obtained by scoring the frequency of occurrence of the above-mentioned historical query words in each of the M documents according to the above-described sorting model f(q, d; ⁇ ) (d 1 ', d 2 ' , ..., d M ') indicates that the above-mentioned server sorts the target document sorted list obtained by sorting the above M documents according to the ascending or descending order of the above scoring results (y 1 , y 2 ,. ., y M ).
- the sort function is a descending sorting model or an ascending sorting model.
- the server evaluates the target document sorting list according to user behavior, and obtains a sorting effect evaluation value r.
- the evaluation system of the foregoing pre-server evaluates the sorted list of the target documents according to the user behavior, specifically, according to the click behavior of the user sorting list for the document, and gives the sorting effect evaluation value r.
- the above-mentioned target document sorting list is displayed on the display interface of the above server in the form of a list.
- the server After receiving the historical query word q input by the user, the server acquires M documents including the historical query word q from the background database, and uses (d 1 , d 2 , . . . , d n ) Said.
- the M documents are sorted in descending order or ascending order according to the scoring result y i to obtain a target document sorted list.
- the search engine After the search engine obtains the sorted list of the target documents, the sorted list of the target documents is displayed in the form of a page.
- the server gets the last click location k, k ⁇ ⁇ 1, 2, ..., M ⁇ of the user in the page.
- the server collects the historical query word q, the M documents, the target document sorting list, and the sorting effect evaluation value, as a training sample, and puts into a training sample set.
- the above training samples can be represented by (q, d 1 , ... d M , ⁇ , r).
- q is the above historical query word
- d 1 , d 2 ... d M is the above M documents
- ⁇ is the above-mentioned target document sorted list (y 1 , y 2 , ..., y M )
- r is the above Sort effect evaluation value.
- the server performs step S405.
- a collection of training samples containing m training samples can be used as a collection Said.
- n is an integer greater than or equal to 1, and m can be 1, 2, 3, 5, 8, or other values.
- the server trains the training sample set by using a reinforcement learning algorithm to obtain the ranking model.
- the server uses the reinforcement learning algorithm to train the training sample set to obtain the ranking model
- the server may be regarded as a parameter ⁇ ′ that the server obtains the maximum value of the expected function according to the training sample set training and the reinforcement learning algorithm training solution.
- the parameter ⁇ ' is substituted for the parameter ⁇ in the sorting model to obtain a new sorting model. This process can be seen as the solved ⁇ .
- ⁇ is solved by the expectation function, which is the largest value of the expected function.
- the expectation function is among them, ⁇ )r (i) , ⁇ (i) is a sorted list of target documents corresponding to the historical query word q in the i-th sample in the above training sample set, and s (i) is a historical query in the i-th sample in the above-mentioned training sample set.
- the M documents corresponding to the word q, r (i) are the sorting effect evaluation values in the i-th sample in the above training sample set.
- s (i) ; ⁇ ) is the probability of obtaining the document sorted list ⁇ (i) on the premise of the above n documents s (i) .
- r (i) ⁇ r (j) represents that the sorting effect of the sorted list of the jth target document in the training sample set is better than that of the sorted list of the i-th target document.
- FIG. 5 is a schematic flowchart diagram of another sorting learning method based on reinforcement learning according to an embodiment of the present invention. As shown in FIG. 5, an embodiment of the present invention provides another method for learning based on reinforcement learning, including:
- the server acquires the historical query word q, and retrieves the background database according to the historical query word q to obtain M documents.
- the query word may be input by a user, or may be automatically obtained by the server.
- the M documents may be represented by a set (d 1 , d 2 , . . . , d M ), and the title or content of any one of the M documents includes the historical query word q.
- the server sorts the M documents according to a sorting model to obtain a sorted list of documents.
- the server according to the foregoing sorting model scores the correlation between each of the M documents and the historical query word q to obtain a scoring result.
- the correlation of each of the above M documents with the above-mentioned historical query word q includes the number of times the above-mentioned historical query word q appears in the title or content of each of the above M documents.
- the server sorts the M documents according to the ascending or descending order of the scoring results to obtain a document sorting list.
- the above sorting model is a differentiable function with parameters, which can be represented by f(q, d; ⁇ ), q is the above historical query word, d is a document obtained according to the historical query word q, and ⁇ is a parameter.
- the above-mentioned server may collect the scoring results obtained by scoring the frequency of occurrence of the above-mentioned historical query words in each of the M documents according to the above-described sorting model f(q, d; ⁇ ) (d 1 ', d 2 ' , ..., d M ') indicates that the above-mentioned server sorts the target document sorted list obtained by sorting the above M documents according to the ascending or descending order of the above scoring results (y 1 , y 2 ,. ., y M ).
- the sort function is a descending sorting model or an ascending sorting model.
- the server acquires an overall scoring result of the target document sorting list by the user or a scoring result of each document in the target document sorting list, and obtains a document sorting evaluation value according to the scoring result.
- the server obtains a score result of each document in the sorted list of the target documents by the user, and obtains a document sorting evaluation value r according to the score result.
- the user scores the correlation between each of the target document sorted lists (y 1 , y 2 , . . . , y M ) and the historical query word q, and gives a score result.
- the scoring result can be used as a set (g 1 , g 2 , ..., g M ).
- the above scoring result g i is the i-th value of the above set (g 1 , g 2 , ..., g M ), g i ⁇ (0, 1 , 2 , 3).
- g i may take other ranges of values, such as (0, 1, 2, 3, 4, 5) or (0, 1, 2, 3, 4, 5, 6, 7).
- the server calculates the sorting effect evaluation value r according to a preset formula.
- the preset formula can be DCG@K, sorting effect evaluation value Where g ⁇ - (i) can be understood as the i-th value of the above set (g 1 , g 2 , ..., g M ), that is, g i .
- Discounted Cumulative Gain is an indicator to measure the server algorithm.
- DCG@K is to calculate the DCG according to the first K results in the search result, and the value of K is related to the number of search results that the user pays attention to.
- the server is a web server
- the server is a question and answer system server
- the server obtains an overall scoring result of the user's sorted list of the target documents, and obtains a document sorting evaluation value according to the scoring result.
- the overall ranking quality of the above sorting result is divided into five levels, then r ⁇ (-2,-1,0,1,2), wherein 2 points indicate that the quality of the overall ranking is perfect, and 1 point indicates the whole.
- the quality of the sorting is good, 0 points means that the quality of the overall sorting is general, -1 points means that the quality of the overall sorting is poor, and -2 points means that the quality of the overall sorting is very poor.
- the ranking quality of the above sorting results is manually scored according to the above five levels.
- the server uses the historical query words, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample, and puts into a training sample set.
- the above training samples can be represented by (q, d 1 , ... d M , ⁇ , r).
- q is the above historical query word
- d 1 , d 2 ... d M is the above M documents
- ⁇ is the above-mentioned target document sorted list (y 1 , y 2 , ..., y M )
- r is the above Sort effect evaluation value r.
- the server executes step S505.
- a collection of training samples containing m training samples can be used as a collection Said.
- n is an integer greater than or equal to 1.
- m can be 1, 2, 3, 5, 8, or other values.
- the server trains the training sample set by using a reinforcement learning algorithm to obtain the ranking model.
- the server uses the reinforcement learning algorithm to train the training sample set to obtain the ranking model
- the server may be regarded as a parameter ⁇ ′ that the server obtains the maximum value of the expected function according to the training sample set training and the reinforcement learning algorithm training solution.
- the parameter ⁇ ' is substituted for the parameter ⁇ in the sorting model to obtain a new sorting model. This process can be seen as the solved ⁇ .
- ⁇ is solved by the expectation function, which is the largest value of the expected function.
- the expectation function is among them, ⁇ (i) is a sorted list of documents corresponding to the historical query word q in the i-th sample in the training sample set, and s (i) is the M documents corresponding to the historical query word q in the i-th sample in the training sample set, r (i) is the ranking effect evaluation value in the i-th sample in the above training sample set.
- s (i) ; ⁇ ) is the probability of obtaining the document sorted list ⁇ (i) on the premise of the above n documents s (i) .
- the embodiment of the present invention further provides a server 600, as shown in FIG. 6, comprising:
- the receiving module 601 is configured to receive a query word input by the user
- the first obtaining module 602 is configured to acquire N documents that match the query word, where the N is a natural number;
- a first sorting module 603 configured to sort the N documents by using a sorting model to obtain a sorted list of documents; wherein the sorting model is according to a reinforcement learning algorithm, a historical query word, and a corresponding corresponding to the historical query word The historical document, the sorted list of documents corresponding to the historical query word, and the evaluation result of the sorting effect are trained;
- the server 600 further includes:
- the second obtaining module 605 is configured to obtain the historical query words, and obtain M documents related to the historical query words.
- the second sorting module 606 is configured to sort the M documents according to the sorting model to obtain a target document sorting list.
- the second sorting module 606 includes:
- a scoring unit 6061 configured to score the relevance of the query words in each of the M documents according to the sorting model to obtain a scoring result
- the second sorting unit 6062 is configured to sort the M documents according to the ascending or descending order of the scoring results to obtain a target document sorting list.
- the third obtaining module 607 is configured to obtain a sorting effect evaluation value of the target document sorting list.
- the third obtaining module 607 is specifically configured to evaluate the sorting effect of the sorted list of the target document according to the user behavior, and obtain an evaluation value of the sorting effect.
- the third obtaining module 607 is specifically configured to obtain a value that is obtained by evaluating a sorting effect of the sorting list of the target document by the user, and is used as the sorting effect evaluation value.
- the foregoing third obtaining module 607 is specifically configured to: according to a result of the user scoring the relevance of each document in the target document sorting list and the historical query word, to evaluate the sorting effect of the target document sorting list. Get the above sorting effect evaluation value.
- the collecting module 608 is configured to use the query word, the M documents, the target document sorting list, and the sorting effect evaluation value as a training sample, and put into a training sample set.
- the training module 609 is configured to train the training sample set to obtain the ranking model by using a reinforcement learning algorithm when the number of training samples in the training sample set is greater than a preset number.
- the display module 604 is configured to present the sorted list of the documents to the user.
- each module (the receiving module 601, the first obtaining module 602, the first sorting module 603, the display module 604, the second obtaining module 605, the second sorting module 606, the third obtaining module 607, and the collecting module 608)
- the training module 609) is for performing the relevant steps of the above method.
- the server 600 is presented in the form of a module.
- a “module” herein may refer to an application-specific integrated circuit (ASIC), a processor and memory that executes one or more software or firmware programs, integrated logic circuits, and/or other devices that provide the above functionality.
- ASIC application-specific integrated circuit
- the above receiving module 601, the first obtaining module 602, the first sorting module 603, the displaying module 604, the second obtaining module 605, the second sorting module 606, the third obtaining module 607, the collecting module 608, and the training module 609 can pass The processor 801 of the terminal device shown in FIG. 8 is implemented.
- server 800 can be implemented in the structure of FIG. 8, which includes at least one processor 801, at least one memory 802, and at least one communication interface 803.
- the processor 801, the memory 802, and the communication interface 803 are connected by the communication bus and complete communication with each other.
- the processor 801 can be a general purpose central processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program.
- CPU central processing unit
- ASIC application-specific integrated circuit
- the communication interface 803 is configured to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Networks (WLAN), and the like.
- RAN Radio Access Network
- WLAN Wireless Local Area Networks
- the memory 802 can be a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (RAM) or other type that can store information and instructions.
- the dynamic storage device can also be an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical disc storage, and a disc storage device. (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be Any other media accessed, but not limited to this.
- the memory can exist independently and be connected to the processor via a bus.
- the memory can also be integrated with the processor.
- the memory 802 is configured to store application code that executes the above solution, and is controlled by the processor 701 for execution.
- the processor 801 is configured to execute application code stored in the memory 802.
- the code stored in the memory 802 may perform the above-described reinforcement learning-based ranking learning method performed by the terminal device provided above, such as the server acquiring a historical query word and acquiring M documents corresponding to the historical query word; the server Sorting the M documents to obtain a target document sorting list; the server acquiring a sorting effect evaluation value of the target document sorting list; the server, the historical query word, the M documents, the The target document sorting list and the sorting effect evaluation value are taken as a training sample and placed in the training sample set; when the number of training samples in the training sample set is greater than a preset number, the server uses the reinforcement learning algorithm to The training sample set is trained to obtain the ranking model.
- the server uses the reinforcement learning algorithm to The training sample set is trained to obtain the ranking model.
- the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium may store a program, where the program includes some or all of the steps of the reinforcement learning based ranking learning method described in the foregoing method embodiments. .
- the disclosed apparatus may be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a memory. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing memory includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like, which can store program codes.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Linguistics (AREA)
- Computational Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种基于强化学习的排序学习方法,包括:服务器获取历史查询词,并获取与历史查询词相对应的M个文档(S201);服务器对M个文档进行排序以获取目标文档排序列表(S202);服务器获取目标文档排序列表的排序效果评估值(S203);服务器将历史查询词、M个文档、目标文档排序列表和排序效果评估值作为一个训练样本,并放入训练样本集合中(S204);当训练样本集中的训练样本的数量大于预设数量时,服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型(S205)。采用本方法有利于精确优化排序指标,从而提高用户对查询词对应的文档排序列表的排序结果的满意度。
Description
本发明涉及排序学习领域,尤其涉及一种基于强化学习的排序学习方法及服务器。
随着互联网快速发展,信息呈现爆炸式的增长。如何从海量的信息中快速找出用户需要的数据成为信息检索研究的重点。目前,从海量的信息中找出需要的数据主要是利用搜素引擎进行检索的,并对搜索结果进行排序。
但随着服务器的发展,影响对搜索结果的排序的因素越来越多,已无法通过人工方式来拟合出排序模型,这时候用机器学习就是非常合适的。排序学习算法是目前非常重要的一种机器学习算法。
排序学习算法是一类基于监督学习的排序算法,已经被广泛应用到搜索、问答以及推荐等问题中。现有的排序算法主要包括:单文档(Pointwise)算法、文档对(Pairwise)算法和文档列表(Listwise)算法。其中,Pointwise算法是将排序问题转化为回归问题,对于每个“查询词-文档”,学习排序模型使其得分与相关性标注拟合;Pairwise算法是将排序问题转化为分类问题,对于每个“查询词”,学习排序模型使得其能够区分不同的“候选文档”间的相关性好坏(由标注决定);Listwise算法是对于每个“查询词”,希望学习排序模型使得该查询的整体排序效果最优。
现有的基于排序学习算法的模型需要依赖查询词与文档之间的相关性标注数据来进行训练,但无法使用通过用户对查询词对应的文档排序列表的排序效果进行评估而得到的数据,无法提高用户对排序效果的满意度。
发明内容
本发明实施例提供一种基于强化学习的排序学习方法和服务器,有利于提高用户对查询词对应的文档排序列表的排序结果的满意度。
第一方面,本发明实施例提供一种基于强化学习的排序学习方法,包括:
接收模块,用于接收用户输入的查询词;
第一获取模块,用于获取与所述查询词相匹配的N个文档;其中,所述N为自然数;
第一排序模块,用于利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词以及与所述历史查询词相对应的历史文档、所述历史查询词对应的文档排序列表和排序效果评估值训练得到的;
显示模块,用于向所述用户呈现所述目标文档排序列表。与现有技术相比,通过强化学习算法不断训练上述排序模型,提高通过该排序模型获得的文档排序列表的排序效果,进而提高用户对排序效果的满意度。
在一种可行的实施例中,在所述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表之前,所述方法包括:
所述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表之前,所述方
法还包括:
所述服务器获取历史查询词,并获取与所述历史查询词相对应的M个文档;
所述服务器对所述M个文档进行排序以获取目标文档排序列表;
所述服务器获取所述目标文档排序列表的排序效果评估值;
所述服务器将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;
当所述训练样本集中的训练样本的数量大于预设数量时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
与现有技术相比,将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为训练样本集合的一个训练样本,通过该训练样本集合和强化学习算法不断优化参量θ,使得期望函数的值不断增大,进而精确优化排序指标,有利于提高用户对查询词对应的文档排序列表的排序结果的满意度。
在一种可行的实施例中,所述服务器根据排序模型对所述M个文档进行排序以获取目标文档排序列表,包括:
所述服务器根据所述排序模型对所述M个文档中的每个文档中与查询词的相关性进行打分以获取打分结果;
所述服务器根据所述打分结果的升序排序或者降序排序对所述n个文档进行排序以获取所述目标文档排序列表。
在一种可行的实施例中,所述服务器获取所述目标文档排序列表的排序效果评估值,包括:
所述服务器根据用户行为对所述目标文档排序列表的排序效果进行评估,获取排序效果的评估值。
在一种可行的实施例中,所述服务器获取所述目标文档排序列表的排序效果评估值,还包括:
所述服务器将所述用户对所述目标文档排序列表的排序效果进行评估所给出的数值作为所述排序效果评估值。
在一种可行的实施例中,所述服务器获取所述目标文档排序列表的排序效果评估值,还包括:
所述服务器根据用户对所述目标文档排序列表中的每个文档与查询词的相关性进行打分的结果,对所述目标文档排序列表的排序效果进行评估以获取所述排序效果评估值。
第二方面,本发明实施例提供一种服务器,包括:
接收模块,用于接收用户输入的查询词;
第一获取模块,用于获取与所述查询词相匹配的N个文档;其中,所述N为自然数;
第一排序模块,用于利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词以及与所述历史查询词相对应的历史文档、所述历史查询词对应的文档排序列表和排序效果评估值训练得到的;
显示模块,用于向所述用户呈现所述文档排序列表。
在一种可行的实施例中,在所述第一排序模块利用排序模型对所述N个文档进行排序
以获取文档排序列表之前,所述服务器还包括:
第二获取模块,用于获取历史查询词,并获取与所述历史查询词相对应的M个文档;
第二排序模块,用于对所述M个文档进行排序以获取目标文档排序列表;
第三获取模块,用于获取所述目标文档排序列表的排序效果评估值;
收集模块,用于将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;
训练模块,用于当所述训练样本集中的训练样本的数量大于预设数量时,利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
在一种可行的实施例中,所述第二排序模块包括:
打分单元,用于根据所述排序模型对所述M个文档中的每个文档中与查询词的相关性进行打分以获取打分结果;
排序单元,用于根据所述打分结果的升序排序或者降序排序对所述M个文档进行排序以获取文档排序列表。
在一种可行的实施例中,所述第三获取模块具体用于:
根据用户行为对所述目标文档排序列表的排序效果进行评估,获取排序效果的评估值。
在一种可行的实施例中,所述第三获取模块具体用于:
所述第三获取模块,用于获取用户对所述目标文档排序列表的排序效果进行评估所给出的数值,作为所述排序效果评估值。
在一种可行的实施例中,所述第三获取模块具体用于:
获取根据用户对所述目标文档排序列表中的每个文档与查询词的相关性进行打分的打分结果,并根据所述打分结果对所述目标文档排序列表的排序效果进行评估以获取所述排序效果评估值。
本发明的这些方面或其他方面在以下实施例的描述中会更加简明易懂。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种搜索文档方法的流程示意图;
图2为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图;
图3为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图;
图4为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图;
图5为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图;
图6为本发明实施例提供的一种服务器的结构示意图;
图7为本发明实施例提供的一种服务器的部分结构示意图;
图8为本发明实施例提供的另一种服务器的结构示意图。
本发明的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
下面结合附图对本申请的实施例进行描述。
请参见图1,图1为本发明实施例提供的一种搜索文档方法的流程示意图。如图1所示,本发明实施例提供的一种基于强化学习的排序学习方法,包括以下步骤:
S101、服务器接收用户输入的查询词。
S102、所述服务器获取与所述查询词相匹配的N个文档;其中,所述N为自然数。
其中,上述服务器接收到上述用户输入的查询词后,在其后台数据库中获取与该查询词相关的N个文档。上述N个文档中的任意一个文档的标题或者内容中包含上述查询词。
S103、所述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词以及与所述历史查询词相对应的历史文档、所述历史查询词对应的文档排序列表和排序效果评估值训练得到的。
其中,上述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表具体包括:
上述服务器根据排序模型对上述N个文档中的每个文档进行打分,并获取打分结果;
上述服务器根据打分结果对上述N个文档进行排序,以获取上述文档排序列表。
具体地,上述排序模型为带参数的可微函数。例如前向神经网络函数MLP(x)
在此需要说明的是,上述服务器根据强化学习算法、历史查询词以及与所述历史查询词相对应的历史文档、所述历史查询词对应的文档排序列表和排序效果评估值训练上述排序模型过程参见图2-图5的相关描述。
S104、所述服务器向所述用户呈现所述文档排序列表。
具体地,上述服务器获取上述文档列表后,将上述文档列表显示出来,供上述用户查阅。
请参见图2,图2为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图。如图2所示,本发明实施例提供的一种基于强化学习的排序学习方法,包括以下步骤:
S201、服务器获取历史查询词,并获取与所述历史查询词相对应的M个文档。
可选地,上述历史查询词可以是用户输入的,也可以是上述服务器自动获取的。
具体地,上述服务器接收到上述历史查询词q后,根据上述历史查询词从上述服务器
的后台数据库中获取与上述历史查询词q相关的M个文档,该M个文档中的任意一个文档的标题或者内容包含上述历史查询词q。
其中,上述M个文档可以用集合(d1,d2,......,dM)表示。
S202、所述服务器对所述M个文档进行排序以获取目标文档排序列表。
具体地,上述所述服务器根据上述排序模型对上述M个文档中的每个文档与上述历史查询词q的相关性进行打分以获取打分结果。
上述M个文档中的每个文档与上述历史查询词q的相关性包括上述历史查询词q在上述M个文档中每个文档的标题或者内容中出现的次数。
上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序以获取文档排序列表。
其中,上述排序模型为带参数的可微函数,可以f(q,d;θ)表示,q为上述查询词,d为根据查询词q获取的文档,θ为参量。
上述服务器根据上述排序模型f(q,d;θ)对在上述M个文档中的每个文档中上述历史查询词出现的频率进行打分而获取的打分结果可以集合(d1',d2',......,dM')表示,上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序而获取的目标文档排序列表可以集合(y1,y2,......,yM)。
进一步,上述过程能以σ=(y1,y2,......,yM)=sort(d1,d2,......,dM)表示。该sort函数为降序排序模型或者升序排序模型。
S203、所述服务器获取所述目标文档排序列表的排序效果评估值。
在此需要说明的是,上述排序效果评估值r越大,则上述目标文档排序列表的排序效果越好,上述用户对文档排序列表的满意度越高。
可选地,上述服务器获取上述目标文档排序列表的排序效果评估值,包括:
上述服务器根据用户行为对上述目标文档排序列表的排序效果进行评估,获取排序效果评估值。
具体地,上述服务器根据用户行为对上述目标文档排序列表(y1,y2,......,yM)的排序效果进行评估具体为:
上述服务器获取用户最后一次在上述目标文档排序列表(y1,y2,......,yM)中的点击位置k,该点击位置k∈{1,2,......,M};上述服务器根据评估函数求得上述服务器获取上述目标文档排序列表的排序效果评估值r。排序效果评估值r越大,则上述目标文档排序列表的排序效果越好。
可选地,上述评估函数可为r=1/k,也可为其他形式的函数。
可选地,上述服务器获取上述目标文档排序列表的排序效果评估值,包括:
上述服务器获取用户对上述目标文档排序列表的排序效果进行评估所给出的数值,作为所述排序效果评估值。
具体地,上述用户对上述查询词q对应的文档排序列表(y1,y2,......,yM)的排序结果的满意程度进行评估,并给出数值。上述服务器将该数值作为上述排序效果评估值r。
其中,上述排序效果评估值r越大,则上述用户对上述查询词q对应的文档排序列表(y1,y2,......,yM)的排序结果越满意。
可选地,上述服务器获取上述目标文档排序列表的排序效果评估值,包括:
上述服务器根据用户对上述目标文档排序列表中的每个文档与上述历史查询词q的相关性进行打分的结果,对上述目标文档排序列表的排序效果进行评估以获取上述排序效果评估值。
具体地,上述用户对上述目标文档排序列表(y1,y2,......,yM)中的每个文档与上述历史查询词q的相关性进行打分,并给出打分结果,该打分结果可用集合(g1,g2,......,gM)。上述打分结果gi为上述集合(g1,g2,......,gM)的第i个值,gi∈(0,1,2,3)。
其中,gσ-(i)可理解为上述集合(g1,g2,......,gM)的第i个值,即gi。
在此需要说明的是,上述DCG英文全称为Discounted Cumulative Gain,它是一个衡量服务器算法的指标。
其中,DCG@K的意义是根据搜索结果中的前K个结果来计算DCG的,K的取值与用户关注的搜索结果的个数有关。
举例说明,假设服务器为网页服务器,用户通常关注前十个(第一页)文档的排序质量,则K=10;假设服务器为问答系统服务器,用户通常关注第一个文档的好坏,则K=1。
S204、所述服务器将所述历史查询词、所述M个文档、所述目标文档列表排序和所述排序效果评估值作为一个训练样本,并放入训练样本集合中。
其中,上述训练样本可用集合(q,d1,......dM,σ,r)表示。q为上述查询词,d1,d2……dM为上述M个文档,σ为上述目标文档排序列表(y1,y2,......,yM),r为上述排序效果评估值。
进一步,当上述训练样本集合中训练样本的数量达到预设数量m时,上述服务器执行步骤S205。
S205、当所述训练样本集中的训练样本的数量大于预设数量m时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
具体地,上述服务器利用强化学习算法对上述训练样本集合进行训练以获取上述排序模型可以看成上述服务器根据上述训练样本集合训练和强化学习算法训练求解使上述期望函数取最大值的参量θ',并将所述参量θ'替代所述排序模型中的参量θ,得到一个新的排序模型。这个过程可以看成求解出的θ。
具体的,求解θ过程如下:
首先,将查询词-文档对(q,di)替换为特征变量xi,则可将上述集合(d1,d2,......,dM)替换
为特征变量集合s=(x1,......,xM),上述训练样本集合替换为排序模型f(q,di;θ)=f(xi;θ)。
其次,通过期望函数来求解θ,该θ为使期望函数值最大。该期望函数为其中,σ(i)为上述训练样本集合中第i样本中的历史查询词q对应的目标文档排序列表,s(i)为上述训练样本集合中第i样本中的历史查询词q对应的M个文档,r(i)为上述训练样本集合中第i样本中的排序效果评估值。
函数p(σ(i)|s(i);θ)为在上述M个文档s(i)的前提下得到文档列表σ(i)的概率,θ为使函数p(σ(i)|s(i);θ)取值最大的参量。
上述其中,r(i)<r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序相比于第i个目标文档效果排序列表的效果较好;r(i)=r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果与第i个目标文档排序列表的排序效果相同;r(i)>r(j)表示为上述训练样本集合中第i个目标文档排序列表的排序效果相比于第j个目标文档排序列表的排序效果较好。
上述步骤S201-S205为上述步骤S103所述的训练上述排序模型的过程。
可以看出,在本发明实施例的方案中,所述服务器获取历史查询词,并获取与所述历史查询词相对应的M个文档;所述服务器对所述M个文档进行排序以获取目标文档排序列表;所述服务器获取所述目标文档排序列表的排序效果评估值;所述服务器将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;当所述训练样本集中的训练样本的数量大于预设数量时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。所述服务器根据所述训练样本集合和强化学习算法获取使期望函数取最大值的参量θ',并将所述参量θ'替代所述排序模型中的参量θ,所述期望函数的最大值表示排序指标的优化程度最高。与现有技术相比,将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为训练样本集合的一个训练样本,通过该训练样本集合和强化学习算法不断优化参量θ,使得期望函数的值不断增大,进而精确优化排序指标,有利于提高用户对查询词对应的目标文档排序列表的排序结果的满意度。
请参见图3,图3为本发明实施例提供的一种基于强化学习的排序学习方法的流程示意图,包括以下步骤:
S301、服务器获取所述历史查询词,并获取与所述历史查询词相关的M个文档。
S302、所述服务器对所述M个文档进行排序以获取目标文档排序列表。
S303、所述服务器获取所述目标文档排序列表的排序效果评估值。
在此需要说明的是,上述步骤S301-S303的具体描述可参照上述步骤S201-S203的相关描述,在此不再赘述。
S304、所述服务器将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中所述服务器的计数器自加1。
其中,上述训练样本可用集合(q,d1,......dM,σ,r)表示。q为上述查询词,d1,d2……dM为上述M个文档,σ为上述目标文档排序列表(y1,y2,......,yM),r为上述排序效果评估值。
上述服务器每当将一个上述训练样本放入上述训练样本集合中后,其计数器自加1。
S305、所述服务器的计数器是否计数至预设数量m。
其中,若上述服务器的计数器计数至预设数量m,则上述服务器执行步骤S306;反之,则上述服务器执行步骤S301。
其中,上述预设阈值m为大于1的整数。
S306、所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
其中,上述服务器将其计数器复位。
具体地,上述述服务器利用强化学习算法对上述训练样本集合进行训练以获取是述排序模型可以看成上述服务器根据上述训练样本集合训练和强化学习算法训练求解使上述期望函数取最大值的参量θ',并将所述参量θ'替代所述排序模型中的参量θ,得到一个新的排序模型。这个过程可以看成求解出的θ
具体的,求解θ过程如下:
首先,将查询词-文档对(q,di)替换为特征变量xi,则可将上述集合(d1,d2,......,dM)替换为特征变量集合s=(x1,......,xn),上述训练样本集合替换为排序模型f(q,di;θ)=f(xi;θ)。
其次,通过期望函数来求解θ,该θ为使期望函数值最大。该期望函数为其中,σ(i)为上述训练样本集合中第i样本中的历史查询词q对应的目标文档排序列表,s(i)为上述训练样本集合中第i样本中的历史查询词q对应的M个文档,r(i)为上述训练样本集合中第i样本中的排序效果评估值。
函数p(σ(i)|s(i);θ)为在上述M个文档s(i)的前提下得到文档列表σ(i)的概率,θ为使函数p(σ(i)|s(i);θ)取值最大的参量。
上述其中,r(i)<r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序相比于第i个目标文档排序列表的排序效果较好;r(i)=r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果与第i个目标文档排序列表的排序效果相同;r(i)>r(j)表示为上述训练样本集合中第i个目标文档排序列表的排序效果相比于第j个目标文档排序列表的排序效果较好。
进一步,上述服务器执行步骤S305之后,再转跳执行上述步骤S301,同时其计数器开始计数。
参见图4,图4为本发明是实施例提供的另一种基于强化学习的排序学习方法的流程示意图。如图4所示,本发明实施例提供了另一种基于强化学习的排序学习的方法,包括:
S401、服务器获取历史查询词q,并所述历史查询词q检索后台数据库获取M个文档。
可选地,上述查询词可以是用户输入的,也可以是上述服务器自动获取的。
其中,上述M个文档中的任意一个文档的标题或者内容包含上述历史查询词q。上述M个文档可通过集合(d1,d2,......,dM)表示。
S402、所述服务器根据排序模型对所述M个文档进行排序以获取文档排序列表。
具体地,上述所述服务器根据上述排序模型对上述M个文档中的每个文档与上述历史查询词q的相关性进行打分以获取打分结果。
上述M个文档中的每个文档与上述历史查询词q的相关性包括上述历史查询词q在上述M个文档中每个文档的标题或者内容中出现的次数。
上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序以获取目标文档排序列表。
其中,上述排序模型为带参数的可微函数,可以f(q,d;θ)表示,q为上述历史查询词,d为根据该历史查询词q获取的文档,θ为参量。
上述服务器根据上述排序模型f(q,d;θ)对在上述M个文档中的每个文档中上述历史查询词出现的频率进行打分而获取的打分结果可以集合(d1',d
2',......,dM')表示,上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序而获取的目标文档排序列表可以集合(y1,y2,......,yM)。
进一步,上述过程能以σ=(y1,y2,......,yM)=sort(d1,d2,......,dM)表示。该sort函数为降序排序模型或者升序排序模型。
S403、所述服务器根据用户行为对所述目标文档排序列表进行评估,并得到排序效果评估值r。
其中,上述预服务器的评估系统根据用户行为对上述目标文档排序列表进行评估具体为根据上述用户针对文档排序列表的点击行为进行评估,并给出排序效果评估值r。
在此需要说明的是,上述排序效果评估值r越大,则上述目标文档排序列表的排序效果越好。上述目标文档排序列表是以列表的形式在上述服务器的显示界面上显示的。
具体地,上述服务器接收到上述用户输入的历史查询词q后,从后台数据库中获取包含历史查询词q的M个文档,用(d1,d2,......,dn)表示。上述服务器对上述M个文档中的第i文档di根据上述排序模型f(q,d;θ)进行打分,并得到打分结果yi,yi=f(q,di;θ)。接着根据打分结果yi对上述M个文档进行降序排序或者升序排序,得到目标文档排序列表。
上述搜素引擎得到上述目标文档排序列表后,以页面的形式显示上述目标文档排序列表。该服务器获取用户在该页面中最后一次点击位置k,k∈{1,2,......,M}。排序效果评估值r通过评估函数r=1/k。若最后一次点击的位置靠前,上述k越小,则根据评估函数r=1/k可知排序效果评估值r越大,进一步可知用户对该历史查询词q的查询结果的满意度越高。
举例说明,假设上述服务器接收到上述用户输入的历史查询词q后,从后台数据库中检索获取包含上述历史查询词q的的10个文档,用(d1,d2,d3,d4,d5,d6,d7,d8,d9,d10)表示。接着根据上述排序模型f(q,d;θ)进行打分,并根据打分结果对上述n个文档进行降序排序,得到文档排序列表,用σ=sort(y1,y2,y3,y4,y5,y6,y7,y8,y9,y10)表示。上述服务器将上述目标文档排序列表以页面的形式显示后。若检测到上述用户最后一次点击位置位于第5个文档上,则k=5,根据上述评估函数r=1/k可知排序效果评估值为0.2;若检测到上述用户最后一次点击位置位于第2个文档上,则k=2,根据上述评估函数r=1/k可知排序效果评估值为0.5;若检测到上述用户未进行任何点击操作,则k=100,根据上述评估函数r=1/k可知排序效果评估值为0.01。可以看出,若排序效果评估值越高,则表示用户对历史查询词q的查询结果的满意度越高。
S404、所述服务器收集所述历史查询词q、所述M个文档、所述目标文档排序列表和所述排序效果评估值,作为一个训练样本,并放入训练样本集合中。
具体地,上述训练样本可以用(q,d1,......dM,σ,r)表示。q为上述历史查询词,d1,d2……dM为上述M个文档,σ为上述目标文档排序列表(y1,y2,......,yM),r为上述排序效果评估值。
可选地,m为大于或等于1的整数,m可为1、2、3、5、8或者其他值。
S405、当所述训练样本集中的训练样本的数量大于预设数量m时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
具体地,上述服务器利用强化学习算法对上述训练样本集合进行训练以获取上述排序模型可以看成上述服务器根据上述训练样本集合训练和强化学习算法训练求解使上述期望函数取最大值的参量θ',并将所述参量θ'替代所述排序模型中的参量θ,得到一个新的排序模型。这个过程可以看成求解出的θ。
具体的,求解θ过程如下:
首先,将查询词-文档对(q,di)替换为特征变量xi,则可将上述集合(d1,d2,......,dM)替换为特征变量集合s=(x1,......,xM),上述训练样本集合替换为f(q,di;θ)=f(xi;θ),
其次,通过期望函数来求解θ,该θ为使期望函数值最大。该期望函数为其中,θ)r(i),σ(i)为上述训练样本集合中第i样本中的历史查询词q对应的目标文档排序列表,s(i)为上述训练样本集合中第i样本中的历史查询词q对应的M个文档,r(i)为上述训练样本集合中第i样本中的排序效果评估值。
函数p(σ(i)|s(i);θ)为在上述n个文档s(i)的前提下得到文档排序列表σ(i)的概率。
上述其中,r(i)<r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果相比于第i个目标文档排序列表的结果较好;r(i)=r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果与第i个目标文档排序列表的结果相同;r(i)>r(j)表示为上述训练样本集合中第i个文档排序列表的排序效果相比于第j个目标文档排序列表的结果较好。
需要说明的是,图4所示的方法的各个步骤的具体实现过程可参见上述方法所述的具体实现过程,在此不再叙述。
参见图5,图5为本发明是实施例提供的另一种基于强化学习的排序学习方法的流程示意图。如图5所示,本发明实施例提供了另一种基于强化学习的排序学习的方法,包括:
S501、服务器获取历史查询词q,并根据所述历史查询词q检索后台数据库获取M个文档。
可选地,上述查询词可以是用户输入的,也可以是上述服务器自动获取的。
其中,上述M个文档可通过集合(d1,d2,......,dM)表示,该M个文档中的任意一个文档的标题或者内容包含上述历史查询词q。
S502、所述服务器根据排序模型对所述M个文档进行排序以获取文档排序列表。
具体地,上述所述服务器根据上述排序模型对上述M个文档中的每个文档与上述历史查询词q的相关性进行打分以获取打分结果。
上述M个文档中的每个文档与上述历史查询词q的相关性包括上述历史查询词q在上述M个文档中每个文档的标题或者内容中出现的次数。
上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序以获取文档排序列表。
其中,上述排序模型为带参数的可微函数,可以f(q,d;θ)表示,q为上述历史查询词,d为根据历史查询词q获取的文档,θ为参量。
上述服务器根据上述排序模型f(q,d;θ)对在上述M个文档中的每个文档中上述历史查询词出现的频率进行打分而获取的打分结果可以集合(d1',d2',......,dM')表示,上述服务器根据上述打分结果的升序排序或者降序排序对上述M个文档进行排序而获取的目标文档排序列表可以集合(y1,y2,......,yM)。
进一步,上述过程能以σ=(y1,y2,......,yM)=sort(d1,d2,......,dM)表示。该sort函数为降序排序模型或者升序排序模型。
S503、所述服务器获取用户对所述目标文档排序列表的整体打分结果或者所述目标文档排序列表中每个文档的打分结果,根据该打分结果获取文档排序评估值。
在此需要说明的是,上述排序效果评估值r越大,则上述目标文档排序列表的排序效果越好。
其中,上述服务器获取用户对上述目标文档排序列表中每个文档的打分结果,根据该打分结果获取文档排序评估值r。
具体地,上述用户对上述目标文档排序列表(y1,y2,......,yM)中的每个文档与上述历史查询词q的相关性进行打分,并给出打分结果,该打分结果可用集合(g1,g2,......,gM)。上述打分结果gi为上述集合(g1,g2,......,gM)的第i个值,gi∈(0,1,2,3)。
可选地,gi也可以取其他范围的值,比如(0,1,2,3,4,5)或者(0,1,2,3,4,5,6,7)。
在此需要说明的是,上述DCG英文全称为Discounted Cumulative Gain,它是一个衡量服务器算法的指标。
其中,DCG@K的意义是根据搜索结果中的前K个结果来计算DCG的,K的取值与用户关注的搜索结果的个数有关。
举例说明,假设服务器为网页服务器,用户通常关注前十个(第一页)文档的排序质量,则K=10;假设服务器为问答系统服务器,用户通常关注第一个文档的好坏,则K=1。
可选地,所述服务器获取用户对所述目标文档排序列表的整体打分结果,根据该打分结果获取文档排序评估值。
举例说明,假设对上述排序结果的整体排序质量分为5个等级,则r∈(-2,-1,0,1,2),其中,2分表示整体排序的质量完美,1分表示整体排序的质量较好,0分表示整体排序的质量一般,-1分表示整体排序的质量较差,-2分表示整体排序的质量很差。人工根据上述5个等级对上述排序结果的排序质量进行打分。
S504、所述服务器将所述历史查询词,所述M个文档、所述目标文档排序列表和排序效果评估值,作为一个训练样本,并放入训练样本集合中。
具体地,上述训练样本可以用(q,d1,......dM,σ,r)表示。q为上述历史查询词,d1,d2……dM为上述M个文档,σ为上述目标文档排序列表(y1,y2,......,yM),r为上述排序效果评估值r。
其中,m为大于或等于1的整数。可选地,m可为1、2、3、5、8或者其他值。
S505、当所述训练样本集中的训练样本的数量大于预设数量m时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
具体地,上述服务器利用强化学习算法对上述训练样本集合进行训练以获取上述排序模型可以看成上述服务器根据上述训练样本集合训练和强化学习算法训练求解使上述期望函数取最大值的参量θ',并将所述参量θ'替代所述排序模型中的参量θ,得到一个新的排序模型。这个过程可以看成求解出的θ。
具体的,求解θ过程如下:
首先,将查询词-文档对(q,di)替换为特征变量xi,则可将上述集合(d1,d2,......,dM)替换为特征变量集合s=(x1,......,xM),上述训练样本集合替换为f(q,di;θ)=f(xi;θ),
其次,通过期望函数来求解θ,该θ为使期望函数值最大。该期望函数为其中,σ(i)为上述训练样本集合中第i样本中的历史查询词q对应的文档排序列表,s(i)为上述训练样本集合中第i样本中的历史查询词q对应的M个文档,r(i)为上述训练样本集合中第i样本中的排序效果评估值。
函数p(σ(i)|s(i);θ)为在上述n个文档s(i)的前提下得到文档排序列表σ(i)的概率。
上述其中,r(i)<r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果相比于第i个目标文档排序列表的结果较好;r(i)=r(j)表示为上述训练样本集合中第j个目标文档排序列表的排序效果与第i个目标文档排序列表的结果相同;r(i)>r(j)表示为上述训练样本集合中第i个目标文档排序列表的排序效果相比于第j个目标文档排序列表的结果较好。
在此需要说明的是,梯度上升方法为本领域人员的公知常识,故上述通过梯度上升方法求解使上述函数值最大的θ的过程在此不再叙述。
本发明实施例还提供了一种服务器600,如图6所示,包括:
接收模块601,用于接收用户输入的查询词;
第一获取模块602,用于获取与所述查询词相匹配的N个文档;其中,所述N为自然数;
第一排序模块603,用于利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词以及与所述历史查询词相对应的历史文档、所述历史查询词对应的文档排序列表和排序效果评估值训练得到的;
可选地,在所述第一排序模块603利用第一排序模型对所述N个文档进行排序以获取文档排序列表之前,所述服务器600还包括:
第二获取模块605,用于获取所述历史查询词,并获取与所述历史查询词相关的M个文档。
第二排序模块606,用于根据排序模型对所述M个文档进行排序以获取目标文档排序列表。
其中,上述第二排序模块606包括:
打分单元6061,用于根据所述排序模型对所述M个文档中的每个文档中与查询词的相关性进行打分以获取打分结果;
第二排序单元6062,用于根据所述打分结果的升序排序或者降序排序对所述M个文档进行排序以获取目标文档排序列表。
第三获取模块607,用于获取所述目标文档排序列表的排序效果评估值。
可选地,上述第三获取模块607具体用于根据用户行为对上述目标文档排序列表的排序效果进行评估,获取排序效果的评估值。
可选地,上述第三获取模块607具体用于获取用户对上述目标文档排序列表的排序效果进行评估所给出的数值,作为上述排序效果评估值。
可选地,上述第三获取模块607具体用于根据用户对上述目标文档排序列表中的每个文档与历史查询词的相关性进行打分的结果,对上述目标文档排序列表的排序效果进行评估以获取上述排序效果评估值。
收集模块608,用于将所述查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中。
训练模块609,用于当所述训练样本集中的训练样本的数量大于预设数量时,利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
显示模块604,用于向所述用户呈现所述文档排序列表。
需要说明的是,上述各模块(接收模块601、第一获取模块602、第一排序模块603、显示模块604、第二获取模块605、第二排序模块606、第三获取模块607、收集模块608、训练模块609)用于执行上述方法的相关步骤。
在本实施例中,服务器600是以模块的形式来呈现。这里的“模块”可以指特定应用集成电路(application-specific integrated circuit,ASIC),执行一个或多个软件或固件程序的处理器和存储器,集成逻辑电路,和/或其他可以提供上述功能的器件。此外,以上接收模块601、第一获取模块602、第一排序模块603、显示模块604、第二获取模块605、第二排序模块606、第三获取模块607、收集模块608、训练模块609可通过图8所示的终端设备的处理器801来实现。
如图8所示,服务器800可以以图8中的结构来实现,该服务器800包括至少一个处理器801,至少一个存储器802以及至少一个通信接口803。所述处理器801、所述存储器802和所述通信接口803通过所述通信总线连接并完成相互间的通信。
处理器801可以是通用中央处理器(CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。
通信接口803,用于与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(Wireless Local Area Networks,WLAN)等。
存储器802可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。
其中,所述存储器802用于存储执行以上方案的应用程序代码,并由处理器701来控制执行。所述处理器801用于执行所述存储器802中存储的应用程序代码。
存储器802存储的代码可执行以上提供的终端设备执行的上述基于强化学习的排序学习方法,比如所述服务器获取历史查询词,并获取与所述历史查询词相对应的M个文档;所述服务器对所述M个文档进行排序以获取目标文档排序列表;所述服务器获取所述目标文档排序列表的排序效果评估值;所述服务器将所述历史查询词、所述M个文档、所述
目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;当所述训练样本集中的训练样本的数量大于预设数量时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任何一种基于强化学习的排序学习方法的部分或全部步骤。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上上述,本说明书内容不应理解为对本发明的限制。
Claims (12)
- 一种基于强化学习的排序学习方法,其特征在于,包括:服务器接收用户输入的查询词;所述服务器获取与所述查询词相匹配的N个文档;其中,所述N为自然数;所述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词、与所述历史查询词相对应的历史文档、与所述历史查询词对应的文档排序列表以及排序效果评估值训练得到的;所述服务器向所述用户呈现所述文档排序列表。
- 根据权利要求1所述的方法,其特征在于,所述服务器利用排序模型对所述N个文档进行排序以获取文档排序列表之前,所述方法还包括:所述服务器获取历史查询词,并获取与所述历史查询词相对应的M个文档;所述服务器对所述M个文档进行排序以获取目标文档排序列表;所述服务器获取所述目标文档排序列表的排序效果评估值;所述服务器将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;当所述训练样本集中的训练样本的数量大于预设数量时,所述服务器利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
- 根据权利要求2所述的方法,其特征在于,所述服务器根据对所述M个文档进行排序以获取目标文档排序列表,包括:所述服务器对所述M个文档中的每个文档中与所述历史查询词的相关性进行打分并获取打分结果;所述服务器根据所述打分结果的升序排序或者降序排序对所述M个文档进行排序并获取所述目标文档排序列表。
- 根据权利要求2或3所述的方法,其特征在于,所述服务器获取所述目标文档排序列表的排序效果评估值,包括:所述服务器根据用户行为对所述目标文档排序列表的排序效果进行评估,并获取排序效果的评估值。
- 根据权利要求2或3所述的方法,其特征在于,所述服务器获取所述目标文档排序列表的排序效果评估值,还包括:所述服务器将所述用户对所述目标文档排序列表的排序效果进行评估所给出的数值作为所述排序效果评估值。
- 根据权利要求2或3所述的方法,其特征在于,所述服务器获取所述目标文档排序列表的排序效果评估值,还包括:所述服务器获取用户对所述目标文档排序列表中的每个文档与查询词的相关性进行打分的打分结果;所述服务器根据所述打分结果对所述目标文档排序列表的排序效果进行评估以获取所述排序效果评估值。
- 一种服务器,其特征在于,包括:接收模块,用于接收用户输入的查询词;第一获取模块,用于获取与所述查询词相匹配的N个文档;其中,所述N为自然数;第一排序模块,用于利用排序模型对所述N个文档进行排序以获取文档排序列表;其中,所述排序模型是根据强化学习算法、历史查询词、与所述历史查询词相对应的历史文档、与所述历史查询词对应的文档排序列表和排序效果评估值训练得到的;显示模块,用于向所述用户呈现所述文档排序列表。
- 根据权利要求7所述的服务器,其特征在于,所述服务器还包括:第二获取模块,用于获取历史查询词,并获取与所述历史查询词相对应的M个文档;第二排序模块,用于对所述M个文档进行排序以获取目标文档排序列表;第三获取模块,用于获取所述目标文档排序列表的排序效果评估值;收集模块,用于将所述历史查询词、所述M个文档、所述目标文档排序列表和所述排序效果评估值作为一个训练样本,并放入训练样本集合中;训练模块,用于当所述训练样本集中的训练样本的数量大于预设数量时,利用强化学习算法对所述训练样本集合进行训练以获取所述排序模型。
- 根据权利要求8所述的服务器,其特征在于,所述第二排序模块包括:打分单元,用于根据所述排序模型对所述M个文档中的每个文档中与查询词的相关性进行打分以获取打分结果;排序单元,用于根据所述打分结果的升序排序或者降序排序对所述M个文档进行排序以获取所述目标文档排序列表。
- 根据权利要求7或8所述的服务器,其特征在于,所述第三获取模块具体用于:根据用户行为对所述目标文档排序列表的排序效果进行评估,获取排序效果的评估值。
- 根据权利要求7或8所述的服务器,其特征在于,所述第三获取模块具体用于:获取所述用户对所述目标文档排序列表的排序效果进行评估所给出的数值,并将所述数值作为所述排序效果评估值。
- 根据权利要求7或8所述的服务器,其特征在于,所述第三获取模块具体用于:获取用户对所述目标文档排序列表中的每个文档与查询词的相关性进行打分的打分结果;根据所述打分结果对所述目标文档排序列表的排序效果进行评估以获取所述排序效果评估值。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/538,174 US11500954B2 (en) | 2017-02-28 | 2019-08-12 | Learning-to-rank method based on reinforcement learning and server |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710114414.0A CN108509461A (zh) | 2017-02-28 | 2017-02-28 | 一种基于强化学习的排序学习方法及服务器 |
CN201710114414.0 | 2017-02-28 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/538,174 Continuation US11500954B2 (en) | 2017-02-28 | 2019-08-12 | Learning-to-rank method based on reinforcement learning and server |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018157625A1 true WO2018157625A1 (zh) | 2018-09-07 |
Family
ID=63369794
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/111319 WO2018157625A1 (zh) | 2017-02-28 | 2017-11-16 | 基于强化学习的排序学习方法及服务器 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11500954B2 (zh) |
CN (1) | CN108509461A (zh) |
WO (1) | WO2018157625A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061968A (zh) * | 2019-11-15 | 2020-04-24 | 北京三快在线科技有限公司 | 排序方法、装置、电子设备及可读存储介质 |
CN111563158A (zh) * | 2020-04-26 | 2020-08-21 | 腾讯科技(深圳)有限公司 | 文本排序方法、排序装置、服务器和计算机可读存储介质 |
CN113485132A (zh) * | 2021-06-21 | 2021-10-08 | 青岛海尔科技有限公司 | 用于智慧家庭系统的管理方法、装置和智慧家庭系统 |
CN115494844A (zh) * | 2022-09-26 | 2022-12-20 | 成都朴为科技有限公司 | 一种多机器人搜索方法及系统 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109436834B (zh) * | 2018-09-25 | 2021-07-06 | 北京金茂绿建科技有限公司 | 一种选取漏斗的方法及装置 |
CN111177585A (zh) * | 2018-11-13 | 2020-05-19 | 北京四维图新科技股份有限公司 | 地图poi反馈方法及装置 |
CN111222931B (zh) * | 2018-11-23 | 2023-05-05 | 阿里巴巴集团控股有限公司 | 一种产品推荐方法及系统 |
CN109933715B (zh) * | 2019-03-18 | 2021-05-28 | 杭州电子科技大学 | 一种基于listwise算法在线学习排序方法 |
CN110909147B (zh) * | 2019-12-02 | 2022-06-21 | 支付宝(杭州)信息技术有限公司 | 一种训练排序结果选择模型输出标准问法的方法和系统 |
CN111310023B (zh) * | 2020-01-15 | 2023-06-30 | 中国人民大学 | 基于记忆网络的个性化搜索方法及系统 |
CN111581545B (zh) * | 2020-05-12 | 2023-09-19 | 腾讯科技(深圳)有限公司 | 一种召回文档的排序方法及相关设备 |
CN112231545B (zh) * | 2020-09-30 | 2023-12-22 | 北京三快在线科技有限公司 | 聚块集合的排序方法、装置、设备及存储介质 |
CN112231546B (zh) * | 2020-09-30 | 2024-04-19 | 北京三快在线科技有限公司 | 异构文档的排序方法、异构文档排序模型训练方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120130994A1 (en) * | 2010-11-22 | 2012-05-24 | Microsoft Corporation | Matching funnel for large document index |
CN102768679A (zh) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | 一种搜索方法及搜索系统 |
CN102779193A (zh) * | 2012-07-16 | 2012-11-14 | 哈尔滨工业大学 | 自适应个性化信息检索系统及方法 |
CN103020164A (zh) * | 2012-11-26 | 2013-04-03 | 华北电力大学 | 一种基于多语义分析和个性化排序的语义检索方法 |
CN104750819A (zh) * | 2015-03-31 | 2015-07-01 | 大连理工大学 | 一种基于词分组排序算法的生物医学文献检索方法及系统 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8112421B2 (en) * | 2007-07-20 | 2012-02-07 | Microsoft Corporation | Query selection for effectively learning ranking functions |
US20130246383A1 (en) * | 2012-03-18 | 2013-09-19 | Microsoft Corporation | Cursor Activity Evaluation For Search Result Enhancement |
CN104077306B (zh) | 2013-03-28 | 2018-05-11 | 阿里巴巴集团控股有限公司 | 一种搜索引擎的结果排序方法及系统 |
CN105224959B (zh) * | 2015-11-02 | 2019-03-26 | 北京奇艺世纪科技有限公司 | 排序模型的训练方法和装置 |
CN105893523B (zh) * | 2016-03-31 | 2019-05-17 | 华东师范大学 | 利用答案相关性排序的评估度量来计算问题相似度的方法 |
US20180052664A1 (en) * | 2016-08-16 | 2018-02-22 | Rulai, Inc. | Method and system for developing, training, and deploying effective intelligent virtual agent |
US10409852B2 (en) * | 2016-12-30 | 2019-09-10 | Atlassian Pty Ltd | Method, apparatus, and computer program product for user-specific contextual integration for a searchable enterprise platform |
-
2017
- 2017-02-28 CN CN201710114414.0A patent/CN108509461A/zh active Pending
- 2017-11-16 WO PCT/CN2017/111319 patent/WO2018157625A1/zh active Application Filing
-
2019
- 2019-08-12 US US16/538,174 patent/US11500954B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120130994A1 (en) * | 2010-11-22 | 2012-05-24 | Microsoft Corporation | Matching funnel for large document index |
CN102768679A (zh) * | 2012-06-25 | 2012-11-07 | 深圳市汉络计算机技术有限公司 | 一种搜索方法及搜索系统 |
CN102779193A (zh) * | 2012-07-16 | 2012-11-14 | 哈尔滨工业大学 | 自适应个性化信息检索系统及方法 |
CN103020164A (zh) * | 2012-11-26 | 2013-04-03 | 华北电力大学 | 一种基于多语义分析和个性化排序的语义检索方法 |
CN104750819A (zh) * | 2015-03-31 | 2015-07-01 | 大连理工大学 | 一种基于词分组排序算法的生物医学文献检索方法及系统 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061968A (zh) * | 2019-11-15 | 2020-04-24 | 北京三快在线科技有限公司 | 排序方法、装置、电子设备及可读存储介质 |
CN111061968B (zh) * | 2019-11-15 | 2023-05-30 | 北京三快在线科技有限公司 | 排序方法、装置、电子设备及可读存储介质 |
CN111563158A (zh) * | 2020-04-26 | 2020-08-21 | 腾讯科技(深圳)有限公司 | 文本排序方法、排序装置、服务器和计算机可读存储介质 |
CN111563158B (zh) * | 2020-04-26 | 2023-08-29 | 腾讯科技(深圳)有限公司 | 文本排序方法、排序装置、服务器和计算机可读存储介质 |
CN113485132A (zh) * | 2021-06-21 | 2021-10-08 | 青岛海尔科技有限公司 | 用于智慧家庭系统的管理方法、装置和智慧家庭系统 |
CN113485132B (zh) * | 2021-06-21 | 2024-03-22 | 青岛海尔科技有限公司 | 用于智慧家庭系统的管理方法、装置和智慧家庭系统 |
CN115494844A (zh) * | 2022-09-26 | 2022-12-20 | 成都朴为科技有限公司 | 一种多机器人搜索方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
US11500954B2 (en) | 2022-11-15 |
US20190370304A1 (en) | 2019-12-05 |
CN108509461A (zh) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018157625A1 (zh) | 基于强化学习的排序学习方法及服务器 | |
WO2019214245A1 (zh) | 一种信息推送方法、装置、终端设备及存储介质 | |
US10515424B2 (en) | Machine learned query generation on inverted indices | |
US20210056571A1 (en) | Determining of summary of user-generated content and recommendation of user-generated content | |
US9348900B2 (en) | Generating an answer from multiple pipelines using clustering | |
CN104090890B (zh) | 关键词相似度获取方法、装置及服务器 | |
US7853599B2 (en) | Feature selection for ranking | |
CN102799591B (zh) | 一种提供推荐词的方法及装置 | |
US20130110829A1 (en) | Method and Apparatus of Ranking Search Results, and Search Method and Apparatus | |
US20100191686A1 (en) | Answer Ranking In Community Question-Answering Sites | |
US9286379B2 (en) | Document quality measurement | |
CN110334356A (zh) | 文章质量的确定方法、文章筛选方法、以及相应的装置 | |
CN111090771B (zh) | 歌曲搜索方法、装置及计算机存储介质 | |
EP3608799A1 (en) | Search method and apparatus, and non-temporary computer-readable storage medium | |
US8825641B2 (en) | Measuring duplication in search results | |
CN107943910B (zh) | 一种基于组合算法的个性化图书推荐方法 | |
CN111026868B (zh) | 一种多维度舆情危机预测方法、终端设备及存储介质 | |
CN111522886A (zh) | 一种信息推荐方法、终端及存储介质 | |
CN106407316B (zh) | 基于主题模型的软件问答推荐方法和装置 | |
EP2613275A1 (en) | Search device, search method, search program, and computer-readable memory medium for recording search program | |
CN111160699A (zh) | 一种专家推荐方法及系统 | |
CN116610853A (zh) | 搜索推荐方法、搜索推荐系统、计算机设备及存储介质 | |
CN110347821A (zh) | 一种文本类别标注的方法、电子设备和可读存储介质 | |
CN111930949B (zh) | 搜索串处理方法、装置、计算机可读介质及电子设备 | |
CN113282831A (zh) | 一种搜索信息的推荐方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17899184 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17899184 Country of ref document: EP Kind code of ref document: A1 |