CN114722086A

CN114722086A - Method and device for determining search rearrangement model

Info

Publication number: CN114722086A
Application number: CN202210367936.2A
Authority: CN
Inventors: 张志钢
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-04-08
Filing date: 2022-04-08
Publication date: 2022-07-08

Abstract

The application discloses a method and a device for determining a search rearrangement model, wherein a plurality of search result items of a target search word are subjected to initial sequencing scoring through an initial scoring sequencing model to obtain respective initial sequencing scores, the initial sequencing scores are evaluated according to expected sequencing through a scoring evaluation model, and respective reward scores are determined; and further determining a loss function of the plurality of search result items according to the initial ranking scores and the reward scores, and performing ranking model training on the initial scoring model according to the loss function to obtain a target scoring ranking model for searching and rearranging the plurality of search result items. Because the reward score is determined based on the expected ranking, the initial scoring ranking model is trained by using a loss function determined according to the initial ranking score and the reward score, so that the initial scoring ranking model can be optimized in the direction of outputting the expected ranking, a target scoring ranking model is obtained, and the accuracy of a ranking result is improved.

Description

Method and device for determining search rearrangement model

Technical Field

The present application relates to the field of data processing, and in particular, to a method and an apparatus for determining a search rearrangement model.

Background

With the rapid development of information technology, online search becomes one of the important ways for people to obtain information. Specifically, after a user inputs search terms in a search system, the system recalls a large number of search result items, performs initial ranking and fine ranking on the search result items, and finally displays part of the search result items to the user.

In a search system, the quality of the search ranking greatly affects the page quality, specifically, the degree of matching between the search results displayed in the page showing the search results to the user and the search expectations of the user. When the page quality is poor, a large amount of redundant information is presented to a user, and the information acquisition efficiency of the user is reduced; when the page quality is good, a page which better accords with the content expected by the user is displayed for the user, and the information acquisition efficiency of the user is improved.

Therefore, how to improve the accuracy of the ranking result aiming at the search result items to be ranked has important significance for improving the search experience of the user.

Disclosure of Invention

In order to solve the technical problem, the application provides a method and a device for determining a search rearrangement model, so that the accuracy of sequencing the search results of target search terms is improved.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a method for determining a search rearrangement model, where the method includes:

acquiring a training sample set corresponding to a target search term; the training sample set comprises the target search term and a plurality of search result entries for the target search term;

inputting the training sample set into an initial scoring ranking model, and determining the initial ranking scores of the search result items through the initial scoring ranking model;

inputting the training sample set including a target ranking label and the initial ranking score into a scoring evaluation model, and determining, by the scoring evaluation model, a reward score for each of the initial ranking scores based on the target ranking label and the initial ranking score; wherein the target rank label is to identify a desired rank of the plurality of search result entries in the training sample set;

determining a loss function for the plurality of search result entries based on the initial ranking score and the reward score;

and carrying out sequencing model training on the initial scoring sequencing model according to the loss function to obtain a target scoring sequencing model for carrying out searching and rearranging on the plurality of search result items.

On the other hand, an embodiment of the present application provides a determining apparatus for searching a rearrangement model, where the apparatus includes an obtaining unit, a determining unit, and a training unit:

the acquisition unit is used for acquiring a training sample set corresponding to the target search term; the training sample set comprises the target search term and a plurality of search result entries of the target search term;

the determining unit is used for inputting the training sample set into an initial scoring and sorting model, and determining the initial sorting scores of the plurality of search result items through the initial scoring and sorting model;

the determination unit is further configured to input the training sample set including a target ranking label and the initial ranking score into a scoring evaluation model, and determine, by the scoring evaluation model, a reward score of each of the initial ranking scores based on the target ranking label and the initial ranking score; wherein the target rank label is to identify a desired rank of the plurality of search result entries in the training sample set;

the determining unit is further configured to determine a loss function of the plurality of search result entries based on the initial ranking score and the reward score;

and the training unit is used for carrying out ranking model training on the initial scoring ranking model according to the loss function to obtain a target scoring ranking model for carrying out search rearrangement on the plurality of search result items.

According to the technical scheme, the initial scoring and ranking of the multiple search result items of the target search word are carried out through the initial scoring and ranking model to obtain the corresponding initial ranking scores, the initial ranking scores output by the initial scoring and ranking model are evaluated through the scoring evaluation model according to the expected ranking of the multiple search result items, and the reward scores are determined; further, a loss function of the plurality of search result items is determined according to the initial ranking score and the reward score, ranking model training is carried out on the initial scoring model according to the loss function, and a target scoring ranking model used for carrying out searching and rearranging on the plurality of search result items is obtained. Therefore, the method for determining the reward score according to the initial ranking score output by the initial scoring ranking model by the scoring evaluation model achieves the purpose of unsupervised training of the initial scoring ranking model, and the reward score is determined based on the expected ranking, so that the initial scoring ranking model is trained by the loss function determined according to the initial ranking score and the reward score, the initial scoring ranking model can be optimized in the direction of outputting the expected ranking, the target scoring ranking model is finally obtained, the accuracy of the ranking result is improved, and the target scoring ranking model is used for searching and rearranging the plurality of search result items so as to display the related search results of the target search word for the user according to the searched and rearranged ranking result.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for determining a search rearrangement model according to an embodiment of the present application;

fig. 2 is a schematic diagram of a framework of a determination method for searching a rearrangement model according to an embodiment of the present application;

fig. 3 is a device configuration diagram of a determination device for searching a rearrangement model according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a flowchart of a method for determining a search rearrangement model according to an embodiment of the present application, where the method includes S101 to S105:

s101: and acquiring a training sample set corresponding to the target search term.

Wherein the training sample set includes the target search term and a plurality of search result entries for the target search term. Specifically, the training sample set may be constructed as follows:

according to a series of search result items obtained by a single search of a target search word by a user, a certain number of single List samples in a training sample set are selected. For example, for a target search term query a, M search result entries are obtained through a single search, and N search result entries and query a are selected to serve as a List sample of query a together, where N is equal to or less than M.

In order to prevent the dirty data from affecting the accurate learning of other data of the training sample set, in the process of constructing the training sample set, the length of the List may be truncated, and specifically, for a series of target search term queries and corresponding search result entries thereof, the search term queries with the number of search result entries smaller than the preset List length are discarded. For example, if N is 10 and the number of search result entries of query B is 5, the set of data of query B is discarded. It should be noted that, the preset List length may be set according to the actual training sample requirement, which is not limited in this application. Further, for a training sample set, multiple List samples may be included.

S102: inputting the training sample set into an initial scoring ranking model, and determining the initial ranking scores of the search result items through the initial scoring ranking model.

And the initial scoring and sequencing model is used for scoring a plurality of search result items of the target search word in the training sample set, obtaining the initial sequencing scores of the plurality of search result items, and determining the initial sequencing of the plurality of search result items of the target search word based on the initial sequencing scores.

As for an input mode of inputting the training sample set into the initial scoring and sorting model, a plurality of List samples in the training sample set may be successively input into the initial scoring and sorting model in a single List sample mode, or the plurality of List samples in the training sample set may be first sorted according to the difficulty level of learning, and then the initial scoring and sorting model may be successively input in an easy-to-go manner, or may be set according to actual training requirements, which is not limited in this respect.

In one possible implementation, S102 includes the following steps:

s1021: inputting the training sample set into an initial scoring and sorting model, and determining feature vectors of a plurality of search result items in the training sample set through a feature extraction layer of the initial scoring and sorting model;

s1022: determining an initial ranking score for each of the plurality of search result entries based on the feature vector.

In one possible implementation, the feature vector includes a base feature vector and a key statistical feature vector.

Specifically, after a training sample set is input into an initial scoring ranking model, basic features including fine ranking scores, relevance scores, cold start scores, strategy rule scores and the like of a plurality of search result items of a target search word are extracted through a feature extraction layer of the initial scoring ranking model, and vectorization is performed to obtain basic feature vectors; extracting key statistical characteristics including historical click rate, historical browsing time length or playing time length and the like of search result items, and vectorizing to obtain key statistical characteristic vectors; based on this, contextual features are obtained that include a base feature vector and a key statistical feature vector. In addition, the ranking positions of a plurality of search result items can be extracted to obtain a position vector, and a ranking module training layer is accessed, so that the context characteristics of the search result items are enriched.

It should be noted that, the categories of the basic vector and the key statistical feature vector may be set according to specific actual training requirements, and no limitation is applied thereto. Such as: when the target search word and the search result item thereof are of a video type, the historical click rate and the historical playing time length can be selected as key statistical characteristics, and when the target search word and the search result item thereof are of a text type, the historical click rate and the historical browsing time length can be selected as key statistical characteristics, so that the target ranking scoring model obtained by final training can better meet the specific use scene requirements.

S103: inputting the training sample set including the target sorting label and the initial sorting score into a scoring evaluation model, and determining the respective reward scores of the initial sorting scores through the scoring evaluation model based on the target sorting label and the initial sorting score.

Wherein the target rank label is used to identify a desired rank of the plurality of search result entries in the training sample set for characterizing an optimal rank result of the plurality of search result entries for the current target search term. Specifically, the most focused key statistical index in a specific usage scenario may be used as a basis for target ranking, and based on this, target ranking tags of multiple search result items of a target search term are determined for identifying a desired ranking of the multiple search result items.

And the scoring evaluation model is used for evaluating the initial ranking scores output by the initial scoring ranking model, and particularly determining reward scores of the initial ranking scores of the plurality of search result items according to the target ranking labels.

In order to enable the scoring evaluation model to evaluate the initial ranking score output by the initial scoring ranking model more accurately, in a possible implementation manner, a time difference method is used for carrying out evaluation model training on the scoring evaluation model. Namely, the evaluation model training is carried out on the grading evaluation model in a TD difference mode, and the training efficiency of the method is higher than that of other methods.

In one possible implementation, the initial ranking of the plurality of search result items may be matched with the desired ranking of the plurality of search result items by the scoring evaluation model, and a reward score for each of the initial ranking scores may be determined; an initial ranking of the plurality of search result entries is determined based on the initial ranking score. For example, for the List sample of query a, the desired rankings of the N search result entries are determined according to the target ranking labels, the initial rankings of the N search result entries are determined according to the initial ranking scores, and further, the reward score is determined for each search result entry according to the matching of the initial rankings and the desired rankings. It should be noted that the reward rules for determining the reward points may also be set in conjunction with the online policy.

In one possible implementation manner, the desired sort is divided into a first sort region and a second sort region according to a preset rule, and the initial sort is divided into the first sort region and the second sort region according to the preset rule; then, said matching, by the scoring evaluation model, the initial rankings of the plurality of search result items to the desired rankings of the plurality of search result items, determining a reward score for each of the initial ranking scores, comprises:

determining a first reward score for search result entries in the first rank region of the initial rank that are consistent with the first rank region of the desired rank;

determining a second reward score for search result entries in the first rank region of the initial rank that are inconsistent with the first rank region of the desired rank;

determining a third reward score for search result entries in the second ranking region of the initial ranking that are consistent with the second ranking region of the desired ranking;

determining a fourth reward score for search result entries in the second ranking region of the initial ranking that are inconsistent with the search result entries in the second ranking region of the desired ranking;

wherein the first and third reward points are positive numbers and the second and fourth reward points are negative numbers.

Specifically, the ranking result may be divided into a first ranking area and a second ranking area according to a preset rule such as the number of presentations, for example, when N is 10, the area of Top1-5 may be used as the first ranking area, and Top6-10 may be used as the second ranking area, where the search result items in each ranking area may be the number that can be presented to the user at a time. The first ranking area serves as a more concerned head area, the accuracy of the ranking results of the first ranking area greatly affects the search experience of the user, so that the initial ranking in the first ranking area can be evaluated firstly, and specifically, a first reward score can be determined for the search result items in the first ranking area of the initial ranking, which are accurately matched with the first ranking area of the expected ranking, and can be a higher reward; the determination of a second reward score, which may be a penalty score, for search result entries in the initially sorted first sort region that fail to match the desired sorted first sort region. Similarly, the initial rank within the second rank region is evaluated.

The reward degrees set for the evaluation rules of the first ranking area and the second ranking area may be the same or different, that is, the first reward score and the third reward score may be the same or different, and the second reward score and the fourth reward score may be the same or different. Specifically, the setting may be performed according to actual training requirements, which is not limited in this application.

It can be understood that, in addition to the manner of dividing the ranking area and evaluating according to the area, the reward score of each initial ranking score can be determined according to the desired ranking, from 1 to N, and according to the matching degree between the initial ranking and the desired ranking, in addition, N reward scores can be set according to the desired ranking from 1 to N, and specifically, the setting can be performed according to the actual training requirement, which is not limited in this application.

S104: determining a loss function for the plurality of search result entries based on the initial ranking score and the reward score.

Since the reward score is determined based on the expected ranking, the reward score can represent the matching degree of the initial ranking score and the expected ranking, namely, the accuracy degree of the initial ranking score output by the initial ranking model, and therefore, according to the initial ranking score and the reward score, a loss function is determined so as to train the initial scoring ranking model through the loss function.

In a possible implementation manner, the initial ranking score may be normalized to obtain an initial ranking loss of the plurality of search result items, and further, a loss function of the plurality of search result items is determined according to the initial ranking loss multiplied by the reward score.

S105: and carrying out sequencing model training on the initial scoring sequencing model according to the loss function to obtain a target scoring sequencing model for carrying out searching and rearranging on the plurality of search result items.

And performing ranking model training on the initial scoring ranking model by using a loss function determined according to the initial ranking score and the reward score, so that the initial scoring ranking model can be optimized in the direction of outputting expected ranking, finally a target scoring ranking model is obtained, the accuracy of the ranking result is improved, and the target scoring ranking model is used for searching and rearranging the plurality of search result items.

It is understood that, in the final online reasoning stage, the search reordering of the plurality of search result entries of the target search term can be implemented by using the target scoring ranking model, so in one possible implementation, after S105, the following steps are further included:

s11: utilizing the target scoring and sorting model to score and sort the plurality of search result items of the target search word to obtain the target sorting of the target search word;

s12: and displaying the plurality of search result items according to the target sequence.

It can be seen that online reasoning is performed using the target scoring ranking model, whereas the scoring evaluation model only works during the offline training phase.

Specifically, after the target scoring and sorting model is obtained, the target scoring and sorting model is used for scoring and sorting a plurality of search result items of the target search terms, the target sorting of the target search terms is obtained, and the plurality of search result items are displayed based on the target sorting.

In one possible implementation, after the target scoring ranking model is obtained, the scoring evaluation model can be deleted, and only the target scoring ranking model is reserved in the subsequent online reasoning use stage.

In performing online reasoning, due to time-consuming effects, in one possible implementation, only the search result entries of the header may be reordered, while others may keep the ordering in the sample training set unchanged.

It can be understood that the method for determining the reward score by the scoring evaluation model according to the initial ranking score output by the initial scoring ranking model in the expected ranking achieves the purpose of unsupervised training of the initial scoring ranking model, and the method for finally obtaining the target scoring ranking model by adopting the training methods of the scoring evaluation model and the initial scoring ranking model is a training method based on reinforcement learning. In one possible implementation manner, the initial scoring model and the scoring evaluation model adopt an Actor-Critic model, wherein both the Actor and the Critic adopt a Transformer structure. The Actor-criticic model of the transform structure based on reinforcement learning is a rearrangement model, and the target scoring and sorting model determined by the training method is a rearrangement model, so that the problems of large strategy fusion mutual exclusion and global perception loss can be effectively solved, the search experience of a user and the large-disk distribution time length are greatly improved, and the strategy iteration efficiency of a rearrangement side is improved.

Therefore, the initial ranking scores corresponding to the multiple search result items of the target search word are obtained by performing initial ranking and scoring on the multiple search result items through the initial scoring ranking model, the initial ranking scores output by the initial scoring ranking model are evaluated through the scoring evaluation model according to the expected ranking of the multiple search result items, and the respective reward scores are determined; further, a loss function of the plurality of search result items is determined according to the initial ranking score and the reward score, ranking model training is carried out on the initial scoring model according to the loss function, and a target scoring ranking model used for carrying out searching and rearranging on the plurality of search result items is obtained. Therefore, the method for determining the reward score according to the initial ranking score output by the initial scoring ranking model by the scoring evaluation model achieves the purpose of unsupervised training of the initial scoring ranking model, and the reward score is determined based on the expected ranking, so that the initial scoring ranking model is trained by the loss function determined according to the initial ranking score and the reward score, the initial scoring ranking model can be optimized in the direction of outputting the expected ranking, the target scoring ranking model is finally obtained, the accuracy of the ranking result is improved, and the target scoring ranking model is used for searching and rearranging the plurality of search result items so as to display the related search results of the target search word for the user according to the searched and rearranged ranking result.

Fig. 2 is a schematic frame diagram of a method for determining a search rearrangement model according to an embodiment of the present application, which can be implemented to the method for determining a search rearrangement model according to the embodiment, and specifically includes two parts, as follows:

the first part is a training part which carries out model training on Actor and Critic by utilizing a training sample set; and the second part is to use the Actor to carry out online reasoning. The Actor is a scoring ranking model, and performs ranking model training on the initial scoring ranking model to obtain a target scoring ranking model finally used for online reasoning; critic is a scoring evaluation model, is used for evaluating the initial scores output by the initial scoring ranking model, and determines corresponding reward points for the initial scores output by the initial scoring ranking model, so that the initial scoring ranking model is trained according to the loss functions determined by the initial scores and the reward points.

Based on the method, scoring and sequencing are completed by using an Actor scoring and sequencing model, the objective of unsupervised training of the Actor is realized by using a Critic scoring evaluation model to determine a reward score according to an initial sequencing score output by the expected sequencing to the Actor, the trained Actor is used as a target scoring and sequencing model, and searching and rearranging of a plurality of retrieval result items of a target retrieval word are realized in a subsequent online reasoning stage, so that a related searching result of the target retrieval word is displayed for a user according to the searching and rearranging sequencing result.

Fig. 3 is a block diagram of an apparatus for determining a search rearrangement model according to an embodiment of the present application, where the apparatus includes an obtaining unit 301, a determining unit 302, and a training unit 303:

the obtaining unit 301 is configured to obtain a training sample set corresponding to a target search term; the training sample set comprises the target search term and a plurality of search result entries for the target search term;

the determining unit 302 is configured to input the training sample set into an initial scoring and ranking model, and determine an initial ranking score of each of the plurality of search result items through the initial scoring and ranking model;

the determining unit 302 is further configured to input the training sample set including a target ranking label and the initial ranking score into a scoring evaluation model, and determine, by the scoring evaluation model, a reward score of each of the initial ranking scores based on the target ranking label and the initial ranking score; wherein the target rank label is to identify a desired rank of the plurality of search result entries in the training sample set;

the determining unit 302 is further configured to determine a loss function of the plurality of search result items based on the initial ranking score and the reward score;

the training unit 303 is configured to perform ranking model training on the initial scoring ranking model according to the loss function, so as to obtain a target scoring ranking model for performing search and rearrangement on the plurality of search result items.

In a possible implementation manner, the determining unit is further configured to:

inputting the training sample set into an initial scoring and sorting model, and determining feature vectors of a plurality of search result items in the training sample set through a feature extraction layer of the initial scoring and sorting model;

determining an initial ranking score for each of the plurality of search result entries based on the feature vector.

matching, by the scoring evaluation model, the initial rankings of the plurality of search result items with the desired rankings of the plurality of search result items, determining a reward score for each of the initial ranking scores; an initial ranking of the plurality of search result entries is determined based on the initial ranking score.

dividing the expected sequence into a first sequence area and a second sequence area according to a preset rule, and dividing the initial sequence into the first sequence area and the second sequence area according to the preset rule;

normalizing the initial ranking score to obtain an initial ranking loss for the plurality of search result entries;

determining a loss function for the plurality of search result entries based on the initial ranking loss multiplied by the reward score.

In one possible implementation, the training unit is further configured to:

and carrying out evaluation model training on the scoring evaluation model by using a time difference method.

utilizing the target scoring and sorting model to score and sort the plurality of search result items of the target search word to obtain the target sorting of the target search word;

and displaying the plurality of search result items according to the target ordering.

In one possible implementation manner, the initial scoring model and the scoring evaluation model adopt an Actor-criticic model.

Therefore, the initial ranking scores corresponding to the multiple search result items of the target search word are obtained by performing initial ranking and scoring on the multiple search result items through the initial scoring ranking model, the initial ranking scores output by the initial scoring ranking model are evaluated through the scoring evaluation model according to the expected ranking of the multiple search result items, and the respective reward scores are determined; and further, determining a loss function of a plurality of search result items according to the initial ranking score and the reward score, and performing ranking model training on the initial scoring model according to the loss function to obtain a target scoring ranking model for searching and rearranging the plurality of search result items. Therefore, the method for determining the reward score according to the initial ranking score output by the initial scoring ranking model by the scoring evaluation model achieves the purpose of unsupervised training of the initial scoring ranking model, and the reward score is determined based on the expected ranking, so that the initial scoring ranking model is trained by the loss function determined according to the initial ranking score and the reward score, the initial scoring ranking model can be optimized in the direction of outputting the expected ranking, the target scoring ranking model is finally obtained, the accuracy of the ranking result is improved, and the target scoring ranking model is used for searching and rearranging the plurality of search result items so as to display the related search results of the target search word for the user according to the searched and rearranged ranking result.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method and the apparatus for determining a search rearrangement model provided in the embodiments of the present application are described in detail above, and specific examples are applied in the present application to explain the principles and the embodiments of the present application. Also, variations in the specific embodiments and applications of the methods of the present application will occur to those skilled in the art.

In summary, the present disclosure should not be construed as limiting the present application, and any changes or substitutions that can be easily conceived by one skilled in the art within the technical scope of the present disclosure should be covered within the protection scope of the present application. Moreover, the present application can be further combined to provide more implementations on the basis of the implementations provided by the above aspects.

Claims

1. A method of determining a search rearrangement model, the method comprising:

inputting the training sample set into an initial scoring and sorting model, and determining the initial sorting scores of the search result items through the initial scoring and sorting model;

inputting the training sample set including a target ranking label and the initial ranking score into a scoring evaluation model, and determining the respective reward scores of the initial ranking scores through the scoring evaluation model based on the target ranking label and the initial ranking score; wherein the target rank label is to identify a desired rank of the plurality of search result entries in the training sample set;

2. The method of claim 1, wherein inputting the training sample set into an initial scoring ranking model by which to determine an initial ranking score for each of the plurality of search result entries comprises:

3. The method of claim 1, wherein determining, by the scoring evaluation model, a reward score for each of the initial ranking scores based on the target ranking label and the initial ranking score comprises:

4. The method of claim 3, further comprising:

said determining, by the scoring evaluation model, a reward score for each of the initial ranking scores by matching the initial ranking of the plurality of search result items with the desired ranking of the plurality of search result items comprises:

5. The method of claim 1, further comprising:

normalizing the initial ranking scores to obtain initial ranking losses of the search result items;

then, said determining a loss function for the plurality of search result entries based on the initial ranking score and the reward score comprises:

6. The method according to any one of claims 1-5, further comprising:

7. The method according to any of claims 1-5, further comprising, after said obtaining a target scoring ranking model for search re-ranking said plurality of search result items:

8. The method according to any one of claims 1 to 5, wherein the initial scoring model and the scoring assessment model employ an Actor-Critic model.

9. A determination apparatus for searching rearrangement models, characterized in that the apparatus comprises an acquisition unit, a determination unit and a training unit:

the acquisition unit is used for acquiring a training sample set corresponding to the target search term; the training sample set comprises the target search term and a plurality of search result entries for the target search term;

the determining unit is used for inputting the training sample set into an initial scoring ranking model, and determining the initial ranking scores of the plurality of search result items through the initial scoring ranking model;