WO2023142042A1 - 排序模型训练方法、装置及存储介质 - Google Patents

排序模型训练方法、装置及存储介质 Download PDF

Info

Publication number
WO2023142042A1
WO2023142042A1 PCT/CN2022/074999 CN2022074999W WO2023142042A1 WO 2023142042 A1 WO2023142042 A1 WO 2023142042A1 CN 2022074999 W CN2022074999 W CN 2022074999W WO 2023142042 A1 WO2023142042 A1 WO 2023142042A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
correlation
training set
query
model
Prior art date
Application number
PCT/CN2022/074999
Other languages
English (en)
French (fr)
Inventor
蔡国豪
张小莲
赵海源
董振华
徐君
何秀强
Original Assignee
华为技术有限公司
中国人民大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 中国人民大学 filed Critical 华为技术有限公司
Priority to PCT/CN2022/074999 priority Critical patent/WO2023142042A1/zh
Publication of WO2023142042A1 publication Critical patent/WO2023142042A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Definitions

  • the present application relates to the technical field of the Internet, and in particular to a sorting model training method, device and storage medium.
  • the information retrieval system provides fast and efficient information acquisition services for Internet users, and is widely used in e-commerce, information flow, smart transportation and other fields.
  • the search system usually records the user's implicit feedback data and uses the user's implicit feedback data as training data to train the core ranking model in the search system.
  • the user's implicit feedback data used to train the ranking model cannot reflect the user's real search intention. If the user's implicit feedback data is directly used as positive and negative samples to train the ranking model, the obtained ranking model will be biased, and as the ranking model is continuously updated, a Matthew effect will be formed, causing the ranking model to become more and more biased.
  • the existing bias correction technology cannot eliminate the influence of selection bias and position bias on the ranking model at the same time.
  • the embodiment of the present application provides a sorting model training method, the method includes: according to the log data, determine the first training set, the first training set includes a plurality of reference query words, and each reference query The first sample corresponding to the word, the position information of the first sample, and the first reference correlation between the first sample and the corresponding reference query word, the first sample includes the Observed query results; according to the position information of each first sample in the first training set, determine the reverse propensity score of each first sample; according to the corresponding reference query words that have not been observed by users Query results, determine the second training set, the second training set includes the second sample corresponding to each reference query word and the second reference correlation between the second sample and the corresponding reference query word; according to the The first training set, the reverse propensity score and the second training set are used to train a ranking model, and the ranking model is used to predict the correlation between query words and query results.
  • the ranking model is trained through the first training set, the second training set and the inverse propensity scores of each sample in the first training set, so that the influence of selection bias and position bias on the ranking model can be eliminated at the same time, An unbiased ranking model is obtained, thereby improving the prediction accuracy of the ranking model for the correlation between query words and query results, and improving user experience.
  • the determining the second training set according to the query results corresponding to the respective reference query words that have not been observed by the user includes: For any query result corresponding to any reference query word that has not been observed by the user, an estimated correlation value between the query result that has not been observed by the user and the reference query word is determined through an interpolation model; according to the interpolation Complementing the output features of each layer of the model to determine the confidence of the correlation estimate; when the confidence is greater than or equal to the preset confidence threshold, according to the query results that have not been observed by the user, determine A second sample corresponding to the reference query word, and determining the correlation estimation value as a second reference correlation between the second sample and the reference query word; The above-mentioned second reference correlation is added to the second training set.
  • the correlation estimation value between the query result not observed by the user and the corresponding reference query word can be determined through the interpolation model, and the correlation estimation can be determined according to the output characteristics of each layer of the interpolation model value, and then according to the confidence of the estimated correlation value, from the query results that have never been observed by the user, select the query results that have a high confidence in the estimated correlation value and have not been observed by the user, and then determine the second sample and The second reference correlation, adding the second sample and the second reference correlation to the second training set. In this way, the second training set is established, which can improve the reliability of the second training set.
  • the correlation is determined according to the output characteristics of each layer of the interpolation model
  • the confidence degree of the estimated value includes: respectively performing feature fusion on the output features of each layer of the interpolation model to obtain fusion features corresponding to the various layers; determining the correlation according to the preset weight and the fusion features. Confidence in the sex estimate.
  • feature fusion in the process of determining the estimated correlation value, can be performed on the output features of each layer of the interpolation model to obtain the fusion features corresponding to each layer, and according to the preset weight and the The fusion features corresponding to each layer determine the confidence degree of the correlation estimation value, so that the accuracy of the confidence degree of the correlation estimation value can be improved.
  • the training ranking model includes: for any reference query word, the reference query word and the first query word corresponding to the reference query word
  • the sample is input into the ranking model to obtain the first correlation; the reference query word and the second sample corresponding to the reference query word are input into the ranking model to obtain the second correlation; according to the first correlation , the second correlation, the first reference correlation corresponding to the first sample, the second reference correlation corresponding to the second sample, and the inverse propensity score of the first sample,
  • a first loss corresponding to the reference query is determined; and the ranking model is adjusted according to the first loss.
  • the first reference correlation, the reverse propensity score and the second training set is used as an unbiased supervision signal to train the ranking model, so that the ranking model can jointly learn the observed samples and unobserved samples, thereby eliminating the influence of selection bias and position bias on the ranking model. Influence, to get an unbiased ranking model.
  • the fourth possible implementation of the ranking model training method according to the first correlation, the second correlation, and the The first reference correlation corresponding to the first sample, the second reference correlation corresponding to the second sample, and the inverse propensity score of the first sample determine the first loss corresponding to the reference query word , comprising: determining a second loss according to the first correlation, the first reference correlation corresponding to the first sample, and the inverse propensity score of the first sample; according to the second correlation and a second reference correlation corresponding to the second sample to determine a third loss; according to the second loss and the third loss, determine a first loss corresponding to the reference query word.
  • the second loss can be determined according to the first correlation, the first reference correlation corresponding to the first sample, and the inverse propensity score of the first sample, and according to the second correlation and The second reference correlation corresponding to the second sample determines the third loss, and then determines the first loss according to the second loss and the third loss, so that when determining the first loss corresponding to the reference query word, both The loss (second loss) corresponding to the observed sample (first sample) also considers the loss (third loss) corresponding to the unobserved sample (second sample), thereby improving the accuracy of the first loss.
  • the method further includes: according to the The first training set and the inverse propensity score are used to train the imputation model.
  • the interpolation model is trained according to the first training set and the inverse propensity scores of each first sample in the first training set.
  • the unbiased first training set can be used to train the interpolation model, so that the influence of position bias on the interpolation model can be eliminated, thereby improving the estimation accuracy of the interpolation model.
  • the training of the An interpolation model including: for the first sample corresponding to any reference query word in the first training set, through the interpolation model, determine the first sample between the first sample and the reference query word Three correlations; according to the output characteristics of each layer of the interpolation model, calculate the uncertainty of the first sample, and the uncertainty is used to indicate the learning difficulty of the first sample; according to The third correlation, the first reference correlation between the first sample and the reference query, the inverse propensity score of the first sample and the uncertainty, determine a fourth loss ; Adjusting the imputation model according to the fourth loss.
  • the first sample between the first sample and the reference query word is determined through the interpolation model.
  • the uncertainty of the first sample can be calculated during the training of the imputation model, and the uncertainty of the first sample can be taken into account when determining the overall loss of the imputation model (i.e. the fourth loss) , so that the interpolation model can better learn the first sample with high learning difficulty, thereby improving the estimation accuracy of the interpolation model.
  • the embodiment of the present application provides a sorting model training device, the device includes: a first training set determination module, used to determine the first training set according to the log data, the first training set includes multiple reference query words, the first sample corresponding to each reference query word, the position information of the first sample, and the first reference correlation between the first sample and the corresponding reference query word, so
  • the first sample includes query results that have been observed by the user;
  • the inverse propensity score determination module determines the inverse propensity score of each first sample according to the position information of each first sample in the first training set
  • the second training set determination module is used to determine the second training set according to the query results corresponding to the respective reference query words that have not been observed by the user, and the second training set includes the corresponding reference query words
  • the first training module is used for according to the first training set, the reverse propensity score and the second training set to train a ranking model, which is used to predict the relevance
  • the ranking model is trained through the first training set, the second training set and the inverse propensity scores of each sample in the first training set, so that the influence of selection bias and position bias on the ranking model can be eliminated at the same time, An unbiased ranking model is obtained, thereby improving the prediction accuracy of the ranking model for the correlation between query words and query results, and improving user experience.
  • the second training set determination module includes: a correlation estimation value determination sub-module, for any reference query word corresponding to For query results that have not been observed by the user, an estimated correlation value between the query result that has not been observed by the user and the reference query word is determined through an interpolation model; The output features of each layer of the model determine the confidence of the estimated correlation value; the sample determination sub-module is used to determine the confidence level according to the user-notified Observing query results, determining a second sample corresponding to the reference query, and determining the correlation estimate as a second reference correlation between the second sample and the reference query; sample addition The submodule is used to add the second sample and the second reference correlation into a second training set.
  • the correlation estimation value between the query result not observed by the user and the corresponding reference query word can be determined through the interpolation model, and the correlation estimation can be determined according to the output characteristics of each layer of the interpolation model value, and then according to the confidence of the estimated correlation value, from the query results that have never been observed by the user, select the query results that have a high confidence in the estimated correlation value and have not been observed by the user, and then determine the second sample and The second reference correlation, adding the second sample and the second reference correlation to the second training set. In this way, the second training set is established, which can improve the reliability of the second training set.
  • the confidence determination submodule is configured to: The output features of the layers are subjected to feature fusion to obtain the fusion features corresponding to the respective layers; according to the preset weight and the fusion features, the confidence degree of the estimated correlation value is determined.
  • feature fusion in the process of determining the estimated correlation value, can be performed on the output features of each layer of the interpolation model to obtain the fusion features corresponding to each layer, and according to the preset weight and the The fusion features corresponding to each layer determine the confidence degree of the correlation estimation value, so that the accuracy of the confidence degree of the correlation estimation value can be improved.
  • the first The training module includes: a first correlation determination sub-module, for any reference query word, inputting the reference query word and the first sample corresponding to the reference query word into a sorting model to obtain the first correlation;
  • the second correlation determination submodule is used to input the reference query word and the second sample corresponding to the reference query word into the ranking model to obtain the second correlation;
  • the first loss determination submodule is used to According to the first correlation, the second correlation, the first reference correlation corresponding to the first sample, the second reference correlation corresponding to the second sample and the first sample
  • the inverse propensity score for determining the first loss corresponding to the reference query word;
  • the first adjustment sub-module is used for adjusting the ranking model according to the first loss.
  • the first reference correlation, the reverse propensity score and the second training set is used as an unbiased supervision signal to train the ranking model, so that the ranking model can jointly learn the observed samples and unobserved samples, thereby eliminating the influence of selection bias and position bias on the ranking model. Influence, to get an unbiased ranking model.
  • the first loss determining submodule is configured to: according to the first correlation, determining a second loss based on the first reference correlation corresponding to the first sample and the inverse propensity score of the first sample; according to the second correlation and the second A third loss is determined with reference to the correlation; and a first loss corresponding to the reference query is determined according to the second loss and the third loss.
  • the second loss can be determined according to the first correlation, the first reference correlation corresponding to the first sample, and the inverse propensity score of the first sample, and according to the second correlation and The second reference correlation corresponding to the second sample determines the third loss, and then determines the first loss according to the second loss and the third loss, so that when determining the first loss corresponding to the reference query word, both The loss (second loss) corresponding to the observed sample (first sample) also considers the loss (third loss) corresponding to the unobserved sample (second sample), thereby improving the accuracy of the first loss.
  • the device further includes: a second A training module, configured to train the interpolation model according to the first training set and the inverse propensity score.
  • the interpolation model is trained according to the first training set and the inverse propensity scores of each first sample in the first training set.
  • the unbiased first training set can be used to train the interpolation model, so that the influence of position bias on the interpolation model can be eliminated, thereby improving the estimation accuracy of the interpolation model.
  • the second training module includes: a second correlation determination submodule, for the The first sample corresponding to any reference query word in the first training set, through the interpolation model, determine the third correlation between the first sample and the reference query word; an uncertainty calculator A module, configured to calculate the uncertainty of the first sample according to the output features of each layer of the interpolation model, where the uncertainty is used to indicate the learning difficulty of the first sample; the first 2.
  • a loss determination sub-module used to determine according to the third correlation, the first reference correlation between the first sample and the reference query word, the reverse propensity score of the first sample, and the The uncertainty is used to determine a fourth loss; the second adjustment submodule is configured to adjust the interpolation model according to the fourth loss.
  • the first sample between the first sample and the reference query word is determined through the interpolation model.
  • the uncertainty of the first sample can be calculated during the training of the imputation model, and the uncertainty of the first sample can be taken into account when determining the overall loss of the imputation model (i.e. the fourth loss) , so that the interpolation model can better learn the first sample with high learning difficulty, thereby improving the estimation accuracy of the interpolation model.
  • an embodiment of the present application provides a sorting model training device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor is configured to implement the above-mentioned
  • the first aspect or one or more of the various possible implementations of the first aspect is a sorting model training method.
  • the ranking model is trained through the first training set, the second training set and the inverse propensity scores of each sample in the first training set, so that the influence of selection bias and position bias on the ranking model can be eliminated at the same time, An unbiased ranking model is obtained, thereby improving the prediction accuracy of the ranking model for the correlation between query words and query results, and improving user experience.
  • the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or several sorting model training methods in various possible implementations.
  • the ranking model is trained through the first training set, the second training set and the inverse propensity scores of each sample in the first training set, so that the influence of selection bias and position bias on the ranking model can be eliminated at the same time, An unbiased ranking model is obtained, thereby improving the prediction accuracy of the ranking model for the correlation between query words and query results, and improving user experience.
  • the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic
  • the processor in the electronic device executes the ranking model training method of the first aspect or one or more of the multiple possible implementations of the first aspect.
  • the ranking model is trained through the first training set, the second training set and the inverse propensity scores of each sample in the first training set, so that the influence of selection bias and position bias on the ranking model can be eliminated at the same time, An unbiased ranking model is obtained, thereby improving the prediction accuracy of the ranking model for the correlation between query words and query results, and improving user experience.
  • Fig. 1 shows a schematic diagram of a search system according to an embodiment of the present application.
  • Fig. 2 shows a flowchart of a ranking model training method according to an embodiment of the present application.
  • Fig. 3 shows a schematic diagram of a calculation process of uncertainty of a first sample according to an embodiment of the present application.
  • Fig. 4 shows a schematic diagram of a search page of a mobile phone application market according to an embodiment of the present application.
  • Fig. 5 shows a schematic diagram of a training process of a ranking model training method according to an embodiment of the present application.
  • Fig. 6 shows a block diagram of a sorting model training device according to an embodiment of the present application.
  • search systems usually record user implicit feedback data.
  • User implicit feedback data (also referred to as user historical behavior data) refers to the relevant data recorded by the search system and the user's query, browsing, clicking, downloading and other behaviors in the search system after entering the search system.
  • the user implicit feedback data can be regarded as the interactive information between the user and the search system.
  • User implicit feedback data is usually recorded in the form of logs.
  • user implicit feedback data is usually used as training data to train the most core ranking model in the search system.
  • the user's implicit feedback data has the characteristics of large data volume and strong timeliness, due to the selection bias and position bias in the search scene, the user's implicit feedback data cannot reflect the user's real search intention.
  • Selection bias means: after the user enters the search system, he enters the query words in the search box to search, and the search system will display the query results on the interactive interface (such as screen, interactive window, etc.). Due to the size limit of the interactive interface, if the user does not Actively performing operations such as turning pages will lead to selection bias, that is, the query results related to the query words entered by the user are not observed by the user, and cannot be used as positive samples to participate in the training of the ranking model. Wherein, not being observed by the user means that the query results related to the query word input by the user are not displayed on the interactive interface, are not exposed, and thus are not seen by the user. Correspondingly, being observed by the user means that the query results related to the query word input by the user are displayed on the interactive interface (that is, have been exposed), and thus have been seen by the user.
  • Position bias means that in the query result list, the position of the query result is different, and the user's attention will be different, resulting in position bias, that is, the user tends to interact with the query result with a better position in the query result list, and The user's tendency has nothing to do with whether the query results can reflect the user's real search intention.
  • the obtained ranking model will be biased, and with the continuous updating of the ranking model, a Matthew effect will be formed, causing the ranking model to become more and more biased.
  • the existing bias correction technology cannot eliminate the influence of selection bias and position bias on the ranking model at the same time.
  • the inverse propensity score (inverse propensity score, IPS) of each position in the query result list is obtained, and then during the ranking model training process, the calculation For loss, any sample in a biased data set (such as user implicit feedback data) is weighted by dividing by the inverse propensity score IPS to eliminate the influence of position bias on the ranking model.
  • this method is suitable for the situation where the training data only includes samples that have been observed by the user, and this method can only eliminate the influence of position bias on the ranking model, but cannot eliminate the influence of selection bias on the ranking model.
  • the samples that have not been observed by the user are obtained.
  • the estimated value of the correlation between them and then train the ranking model based on the samples that have been observed by the user (ie, the exposed samples) and the samples that have not been observed by the user (ie, the unexposed samples).
  • this method can only eliminate the influence of selection bias on the ranking model, but cannot eliminate the influence of position bias on the ranking model.
  • the present application provides a sorting model training method, which can determine the first training set according to the log data, the first training set includes a plurality of reference query words, and the first training set corresponding to each reference query word
  • the sample, the position information of the first sample, and the first reference correlation between the first sample and the corresponding reference query word, the first sample includes the query result that has been observed by the user; according to each of the first training set
  • the position information of the first sample determines the inverse propensity score of each first sample; according to the query results not observed by the user corresponding to each reference query word, the second training set is determined, and the second training set includes information related to each reference query word.
  • the ranking model training method of the embodiment of the present application can determine the first training set corresponding to the query results observed by the user, the inverse propensity score of each sample in the first training set, and the query results corresponding to the query results not observed by the user.
  • the second training set, and the ranking model is trained through the first training set, the second training set, and the inverse propensity scores of each sample in the first training set, so that the selection bias and position bias can be eliminated at the same time.
  • the impact of the unbiased ranking model can be obtained, and then the ranking model can improve the prediction accuracy of the correlation between the query word and the query result, and improve the user experience.
  • the ranking model training method of the embodiment of the present application can be used to train ranking models in various search systems.
  • the search system can be, for example, a product search system in e-commerce, an application (APP) search system in an application market, a search system in a content (such as video, news, etc.) search scene, etc.
  • Application scenarios are not limited.
  • the influence of selection bias and position bias on the ranking model in the search system can be eliminated at the same time, so that the query results fed back by the search system can meet the user's real search intention.
  • other prediction models affected by selection bias and position bias can also eliminate selection bias and The effect of positional bias. It should be noted that this application does not limit the specific application scenarios of the ranking model training method.
  • Fig. 1 shows a schematic diagram of a search system according to an embodiment of the present application.
  • the search system 100 includes an interactive interface 110, a ranking model 120, an offline training module 130, log data 140 and query results 150 not observed by users.
  • the interactive interface 110 is an interactive interface between the search system 100 and the user, such as a screen, an interactive window, and the like.
  • the interactive interface 110 can be used to provide a search box, so that the user can input query words for query.
  • the interactive interface 110 can also be used to display the query results of the search system 100 for users to view.
  • the ranking model 120 can be used to predict the correlation between the query words input by the user and the query results, so as to realize online inference.
  • the ranking model 120 can also sort the query results based on the predicted relevance, and display the sorted query results through the interactive interface 110 .
  • the ranking model 120 can predict the correlation between the query words input by the user and the query results according to the user's personal information, historical data, and context (such as current time, festivals, solar terms, etc.), so as to give Personalized search results.
  • the offline training (offline training) module 130 is used to perform offline training on the ranking model 120 through the ranking model training method described in the embodiment of the present application according to the log data 140 stored in the search system 100 and the query results 150 not observed by the user, In order to eliminate the influence of selection bias and position bias on the ranking model 120, thereby improving the prediction accuracy of the ranking model 120.
  • the search system 100 is a commodity search system in e-commerce
  • the user uses the search system 100, he can enter the search system 100 through the interactive interface 110, and input query words in the search box of the interactive interface 110 to trigger a query request
  • the query request and related information (such as the user's personal information, historical data and context environment, etc.) will be input into the ranking model 120 for online prediction; the ranking model 120 predicts the query words and search terms entered by the user according to the query request and related information.
  • the search system 100 can record the user's actual operation behavior through the log data 140 and store the query results 150 not observed by the user.
  • the search system 100 can perform offline training on the ranking model 120 through the offline training module 130 according to the preset cycle, log data 140 and query results 150 not observed by the user, and periodically update the model parameters of the ranking model 120 to improve the ranking model. 120% predictive accuracy.
  • the search system 100 is an application (APP) search system in the mobile phone application market.
  • APP application
  • the search system 100 When the user opens the mobile phone application market and uses the search system 100, he can enter a query word in the search box provided by the interactive interface 110 to trigger a query request.
  • the query request and related information (such as the user's historical download record, user click record, application characteristics, current time, location and other information) will be input into the ranking model 120 for online prediction; the ranking model 120 will be based on the query request and related information , to predict the possibility of the user downloading each given candidate application, and to arrange the applications in descending order according to the predicted possibility;
  • the application is ranked in the front position, and the candidate applications that are unlikely to be downloaded are ranked in the lower position), so as to obtain the query results and display them to the user, so as to achieve the effect of increasing the probability of application download.
  • the search system 100 can record the user's actual operation behavior through the log data 140, and store the query results 150 that are not observed by the user.
  • the search system 100 can perform offline training on the ranking model 120 through the offline training module 130 according to the preset cycle, log data 140 and query results 150 not observed by the user, and periodically update the model parameters of the ranking model 120 to improve the ranking model. 120% predictive accuracy.
  • Fig. 2 shows a flowchart of a ranking model training method according to an embodiment of the present application.
  • the ranking model training method includes:
  • Step S210 according to the log data, determine the first training set.
  • the log data is the log data recorded by the search system to which the ranking model to be trained belongs, and the log data includes user implicit feedback data.
  • the first training set includes a plurality of reference query words, a first sample corresponding to each reference query word, position information of the first sample, and a first reference correlation between the first sample and the corresponding reference query word .
  • the query words entered by the user recorded in the log data may be used as reference query words.
  • a plurality of query results corresponding to the reference query word can be determined from the log data;
  • the relevant information of the query result such as the personal information of the user who entered the query word, user history data, query time, location and other context information
  • the position of the query result in the query result list, and the query result are determined.
  • the first reference correlation between the first sample and the reference query value is determined according to the user's implicit feedback data corresponding to the query result.
  • the first reference correlation between the first sample and the corresponding reference query word can be It is determined to be 1; assuming that in the user implicit feedback data corresponding to the query result, the value of whether the user clicks is 0, indicating that the user has not clicked, then the first reference between the first sample and the corresponding reference query word can be The correlation was determined to be 0.
  • the first sample includes query results that have been observed by the user, the first sample can be regarded as an observed sample.
  • the first samples in the first training set are all observed samples, that is to say, the first training set is an observed sample set.
  • Step S220 according to the location information of each first sample in the first training set, determine the inverse propensity score of each first sample.
  • the inverse propensity score IPS of each position in the query result list can be calculated through existing related technologies.
  • the inverse propensity score ISP of each position in the query result list can be calculated through random traffic: the query results can be randomly displayed in different positions in the query result list, and then the user's click rate at different positions can be calculated, and based on the click rate to determine the inverse propensity score IPS for different locations.
  • the inverse propensity score IPS of each position in the query result list can be determined according to the position information of each first sample and the inverse propensity score IPS of each position. For example, assuming that the position information of the first sample is position 9, it means that the query result included in the first sample is in the ninth position in the query result list, and the inverse propensity score IPS corresponding to position 9 is 0.15, then, it can be The inverse propensity score IPS for this first sample was determined to be 0.15.
  • the first training set and the inverse propensity score of each first sample in the first training set can be , to train the imputation model.
  • An imputation model can be used to determine relevance estimates between query results not observed by users and corresponding reference query terms. That is, the interpolation model can be used to predict the correlation between query results not observed by users and corresponding reference query words.
  • the interpolation model may include an input layer, at least one intermediate layer and an output layer.
  • the unbiased first training set can be used to train the interpolation model, so that the influence of position bias on the interpolation model can be eliminated, thereby improving the estimation accuracy of the interpolation model.
  • the third correlation between the first sample and the corresponding reference query word may be determined through an interpolation model.
  • the third correlation is an estimate.
  • the uncertainty of the first sample can be calculated according to the output characteristics of each layer of the interpolation model (including the input layer, the intermediate layer and the output layer).
  • the uncertainty of the first sample can be used to indicate how easy the first sample is to learn. The larger the value of the uncertainty of the first sample, the higher the learning difficulty of the first sample; the smaller the value of the uncertainty of the first sample, the lower the learning difficulty of the first sample.
  • each layer of the interpolation model can be obtained separately when the interpolation model processes the first sample to obtain the third correlation
  • the output features of the interpolation model and perform feature fusion (such as linear mapping, etc.) on the output features of each layer of the interpolation model to obtain the fusion features corresponding to each layer, and then according to the preset weight and the fusion features corresponding to each layer, through
  • the uncertainty of the first sample is obtained by means of weighted summation and the like.
  • Fig. 3 shows a schematic diagram of a calculation process of uncertainty of a first sample according to an embodiment of the present application.
  • the interpolation model 300 includes an input layer 301 , an intermediate layer 302 and an output layer 303 .
  • the interpolation model 300 in FIG. 3 is taken as an example, and it only includes one intermediate layer 302. In practical applications, there may be multiple intermediate layers of the interpolation model. The specific number is not limited.
  • the input layer 301, the intermediate layer 302 and the output layer can be respectively obtained during the process of processing the first sample 304 by the interpolation model 300 to obtain the third correlation 305 303's output features, and respectively perform feature fusion on the output features of the input layer 301, the middle layer 302, and the output layer 303 to obtain the fusion features corresponding to each layer: the feature fusion can be performed on the output features of the input layer 301, and the input The fusion feature 309 corresponding to the layer 301; the output feature of the input layer 301 and the output feature of the intermediate layer 302 can be feature-fused to obtain the fusion feature 308 corresponding to the intermediate layer 302; the output feature in the output layer 303 can be feature-fused , to obtain the fusion feature 307 corresponding to the output layer 303 .
  • the uncertainty of the first sample 304 can be obtained by means of weighted summation: the fusion features 307 can be calculated according to the preset weights (not shown in the figure). , the fusion feature 308 and the fusion feature 309 are weighted and summed to obtain the uncertainty 310 of the first sample 304 .
  • the fourth loss can be determined according to the third correlation, the first reference correlation between the first sample and the corresponding reference query word, the reverse propensity score and the uncertainty .
  • the correlation estimation loss of the interpolation model can be determined according to the third correlation, the first reference correlation between the first sample and the corresponding reference query word, and the reverse propensity score, and then the correlation estimation loss and The sum of the uncertainties of the first sample is determined as the fourth loss, that is, the uncertainty of the first sample is used as a regular term and added to the fourth loss to force the interpolation model to the first sample which is more difficult to learn. Ben to learn better.
  • the interpolation model can be adjusted according to the fourth loss.
  • the fourth loss meets the preset first training end condition (the first training end condition can be, for example, the fourth loss is less than the preset first loss threshold, or the fourth loss converges in a certain interval)
  • the first training end condition can be, for example, the fourth loss is less than the preset first loss threshold, or the fourth loss converges in a certain interval
  • the uncertainty of the first sample can be calculated during the training of the imputation model, and the uncertainty of the first sample can be taken into account when determining the overall loss of the imputation model (i.e. the fourth loss) , so that the interpolation model can better learn the first sample with high learning difficulty, thereby improving the estimation accuracy of the interpolation model.
  • Step S230 Determine a second training set according to the query results corresponding to the respective reference query words that have not been observed by the user.
  • the second training set may include a second sample corresponding to each reference query word, and a second reference correlation between the second sample and the corresponding reference query word.
  • the unobserved query results stored by the search system to which the ranking model belongs correspond to the log data. Therefore, from the unobserved query results stored in the search system, it is possible to determine the reference queries in the first training set.
  • the query results corresponding to the words are not observed by the user.
  • the An estimated correlation value between the query result not observed by the user and the reference query word is determined through an interpolation model.
  • the output characteristics of each layer (including the input layer, the middle layer and the output layer) of the interpolation model can be , to determine the confidence level for this correlation estimate.
  • the confidence level for the correlation estimate can be used to indicate the reliability of the correlation estimate. A higher confidence level for the correlation estimate indicates a higher reliability of the correlation estimate. A lower confidence level for the correlation estimate indicates a less reliable correlation estimate.
  • the output features of each layer of the interpolation model can be obtained respectively during the process of determining the estimated correlation value, and each layer of the interpolation model can be respectively
  • the output features of the layer are subjected to feature fusion (for example, linear mapping, etc.) to obtain the fusion features corresponding to each layer, and then according to the preset weight and the fusion features corresponding to each layer, determine the correlation estimation value by means of weighted summation, etc. confidence level. In this way, the accuracy of the confidence of the correlation estimate can be improved.
  • feature fusion for example, linear mapping, etc.
  • the calculation method of the confidence degree of the correlation estimation value is similar to the calculation method of the uncertainty of the first sample in the interpolation model training process.
  • the feature fusion of the output features of each layer of the interpolation model is to calculate the uncertainty of the first sample, which is used to indicate the learning difficulty of the first sample; and in In the prediction/interpolation process of the interpolation model, the feature fusion is performed on the output features of each layer of the interpolation model to calculate the confidence of the correlation estimate, which is used to indicate the reliability of the correlation estimate.
  • the relationship between the confidence of the estimated correlation value and a preset confidence threshold may be determined.
  • the second sample corresponding to the reference query word may be determined according to the query result not observed by the user, and the correlation estimation value may be determined is the second reference correlation between the second sample and the reference query word.
  • the second reference correlation here is an estimated value.
  • the query results that have not been observed by the user and their related information is determined as the second sample corresponding to the reference query word, and the correlation estimation value is determined as the second reference correlation between the second sample and the reference query word.
  • the second sample and the second reference correlation between the second sample and the reference query word can be added to the second training set.
  • the second sample since the second sample includes query results that have not been observed by the user, the second sample can be regarded as an unobserved sample.
  • the second samples in the second training set are all unobserved samples, that is to say, the second training set is an unobserved sample set.
  • Step S240 training a ranking model according to the first training set, the reverse propensity score and the second training set.
  • the ranking model can be used to predict the correlation between query words and query results.
  • the ranking model when training the ranking model according to the first training set, the reverse propensity score and the second training set, for any reference query word, the reference query word and its corresponding first
  • the sample is input into the ranking model to obtain the first correlation
  • the reference query word and the second sample corresponding to it are input into the ranking model to obtain the second correlation.
  • the second correlation the first reference correlation corresponding to the first sample, the second reference correlation corresponding to the second sample, and the inverse propensity score of the first sample, it can be determined that the correlation with The first loss corresponding to the reference query word.
  • the first correlation, the first reference correlation corresponding to the first sample, and the reverse tendency of the first sample can be Determine the second loss according to the correlation score; determine the third loss according to the second correlation and the second reference correlation corresponding to the second sample; then determine the first loss according to the second loss and the third loss.
  • the first loss corresponding to the reference query word not only the loss corresponding to the observed sample (first sample) (second loss) but also the loss corresponding to the unobserved sample (second sample ) corresponding to the loss (the third loss), which can improve the accuracy of the first loss.
  • the first loss L q corresponding to the reference query word q can be determined by the following formula (1):
  • the ranking model may be adjusted according to the first loss.
  • the second training end condition can be, for example, the first loss is less than the preset second loss threshold, or the first loss converges within a certain interval
  • the training ends Get the trained ranking model.
  • the ranking model when training the sorting model according to the first training set, the reverse propensity score and the second training set, the first reference correlation in the first training set, the reverse propensity score and the second training set
  • the second reference correlation is used as an unbiased supervision signal to train the ranking model, so that the ranking model can jointly learn the observed samples and unobserved samples, thereby eliminating the influence of selection bias and position bias on the ranking model, and obtaining Unbiased ranking models.
  • the ranking model may be alternately trained by using the first sample in the first training set and the second sample in the second training set, so as to improve the prediction accuracy of the ranking model.
  • the ranking model training method of the embodiment of the present application will be exemplarily described below in conjunction with a specific example of a mobile phone application market and FIG. 4 and FIG. 5 .
  • the ranking model in the search system of the mobile application market can predict the user's click probability for each application in the application candidate set according to the user's personal information, application candidate set and context information, and rank each application in the application candidate set according to the click probability.
  • the applications are sorted in descending order, and the applications that are most likely to be downloaded by users are ranked at the top.
  • the ranking model in the search system of the mobile application market can also predict the correlation between the query words entered by the user and the applications in the application candidate set based on the query words input by the user, the user's personal information, the application candidate set, and the context information. And according to the correlation, each application in the application candidate set is sorted in descending order, and the application with the greatest correlation with the query word input by the user is ranked at the top.
  • Fig. 4 shows a schematic diagram of a search page of a mobile phone application market according to an embodiment of the present application.
  • the search page 400 of the mobile application market can be regarded as an interactive interface between the search system of the mobile application market and the user, which includes a search box 410 at the top (the current query word is "basketball") and query results Subpage 420 is displayed.
  • the query result display sub-page 420 shows the ranking of relevant application APPs displayed by the search system of the mobile application market for the user according to the query word "basketball", which are application 421, application 422, application 423, application 424 and application 425.
  • the user can perform operations such as browsing, clicking, and downloading according to personal interests. These operations will be recorded in the form of logs by the search system of the mobile application market,
  • the search system will also store the query results that are not displayed in the query result display sub-page 420 (that is, the query results that have not been observed by the user), so that the mobile application market can be analyzed according to the log data and the query results that have not been observed by the user.
  • the ranking model in the search system is trained.
  • the ranking model training method of the embodiment of the present application can be used to train the ranking model in the search system of the mobile application market, so as to eliminate the selection bias and position bias at the same time.
  • the impact of position bias on the ranking model thereby improving the prediction accuracy of the ranking model in the search system of the mobile application market, and thus improving user experience.
  • Fig. 5 shows a schematic diagram of a training process of a ranking model training method according to an embodiment of the present application. As shown in Figure 5, when training the ranking model in the search system of the mobile application market, the training process is as follows:
  • the first training set is determined according to the log data, wherein the log data is the log data recorded by the search system of the mobile phone application market, the first training set includes a plurality of reference query words, and the first training set corresponding to each reference query word a sample, location information of the first sample, and a first reference correlation between the first sample and a corresponding reference query term;
  • step S520 according to the position information of each first sample in the first training set, determine the inverse propensity score of each first sample
  • step S530 according to the first training set and the reverse tendency score of each first sample, train the interpolation model to obtain the trained interpolation model;
  • step S540 for the query result not observed by the user corresponding to any reference query word, determine the correlation estimation value between the query result not observed by the user and the reference query word through the trained interpolation model;
  • step S550 according to the output characteristics of each layer of the interpolation model, the confidence degree of the estimated correlation value is determined
  • step S560 it is judged whether the confidence degree of the estimated correlation value is greater than or equal to a preset confidence degree threshold
  • step S570 is executed to determine the second sample corresponding to the reference query word according to the query result not observed by the user, and the correlation estimation value the value is determined as a second reference correlation between the second sample and the reference query term;
  • step S580 adding the second sample and the second reference correlation to the second training set
  • step S590 the ranking model is trained according to the first training set, the inverse propensity scores of each first sample and the second training set.
  • training the sorting model in the search system of the mobile phone application market can eliminate the influence of selection bias and position bias on the sorting model at the same time, and obtain an unbiased sorting model, thereby improving the search system of the mobile phone application market.
  • the predictive accuracy of the ranking model in and then improve the user experience of the mobile application market.
  • the search system using the ranking model trained by the training method of the embodiment of the present application has an improvement of 3.5% in mean average precision (MAP) index;
  • the improvement in accuracy (precision) indicators is: 3.2% on Precision@5, 0.1% on Precision@10; normalized discounted cumulative gain (NDCG)
  • the improvement on the indicator is: the improvement on the NDCG@5 indicator is 4.3%, and the improvement on the NDCG@10 is 2.8%;
  • the improvement on the average reciprocal rank (mean reciprocal rank, MRR) indicator is: on the MRR@5
  • the improvement on the indicator is 3.9%, and the improvement on the MRR@10 indicator is 2.8%.
  • the search performance of the search system using the ranking model trained by the training method of the embodiment of the present application has been significantly improved in the above four typical indicators.
  • Fig. 6 shows a block diagram of a sorting model training device according to an embodiment of the present application.
  • the sorting model training device includes:
  • the first training set determining module 610 is configured to determine a first training set according to the log data, the first training set includes a plurality of reference query words, first samples corresponding to each reference query word, the first sample The location information of the book, and the first reference correlation between the first sample and the corresponding reference query word, the first sample includes query results that have been observed by the user;
  • Reverse propensity score determining module 620 for determining the reverse propensity score of each first sample according to the position information of each first sample in the first training set;
  • the second training set determination module 630 is configured to determine a second training set according to the query results corresponding to the respective reference query words that have not been observed by the user, and the second training set includes the corresponding reference query words a second sample, and a second reference correlation between the second sample and the corresponding reference query term;
  • the first training module 640 is used to train a ranking model according to the first training set, the reverse propensity score and the second training set, and the ranking model is used to predict the correlation between query words and query results sex.
  • the second training set determination module 630 includes: a correlation estimation value determination submodule, for any query result corresponding to any reference query word that has not been observed by the user, through the interpolation model , to determine the estimated correlation value between the query result that has not been observed by the user and the reference query word; the confidence determination sub-module is used to determine the correlation according to the output characteristics of each layer of the interpolation model The confidence degree of the estimated value; the sample determination submodule is used to determine the reference query word according to the query result that has not been observed by the user when the confidence degree is greater than or equal to the preset confidence threshold value. The corresponding second sample, and determine the correlation estimate value as the second reference correlation between the second sample and the reference query word; the sample adding submodule is used to combine the second sample and the The second reference correlation is added to the second training set.
  • the confidence determination submodule is configured to: respectively perform feature fusion on the output features of each layer of the interpolation model to obtain fusion features corresponding to each layer; Setting the weight and the fusion feature to determine the confidence of the estimated correlation value.
  • the first training module 640 includes: a first correlation determination submodule, for any reference query word, the reference query word and the corresponding first query word One sample is input into the ranking model to obtain the first correlation; the second correlation determination submodule is used to input the reference query word and the second sample corresponding to the reference query word into the ranking model to obtain The second correlation; the first loss determination submodule is used to determine according to the first correlation, the second correlation, the first reference correlation corresponding to the first sample, and the second sample Corresponding to the second reference correlation and the inverse propensity score of the first sample, determine the first loss corresponding to the reference query word; the first adjustment submodule is used to adjust the first loss according to the first loss sorting model.
  • the first loss determining submodule is configured to: according to the first correlation, the first reference correlation corresponding to the first sample, and the first sample Determine the second loss based on the reverse propensity score; determine the third loss according to the second correlation and the second reference correlation corresponding to the second sample; determine the third loss according to the second loss and the third loss , determining a first loss corresponding to the reference query word.
  • the device further includes: a second training module, configured to train the interpolation model according to the first training set and the inverse propensity score.
  • the second training module includes: a second correlation determination submodule, for the first sample corresponding to any reference query word in the first training set, through the interpolation Complementary model, to determine the third correlation between the first sample and the reference query word; the uncertainty calculation submodule is used to calculate the third correlation according to the output characteristics of each layer of the interpolation model Uncertainty of a sample, the uncertainty is used to indicate the learning difficulty of the first sample; the second loss determination sub-module is used to determine according to the third correlation, the first sample The first reference correlation between this and the reference query word, the inverse propensity score of the first sample and the uncertainty determine the fourth loss; the second adjustment submodule is used to determine the fourth loss according to the The fourth loss, adjusts the imputation model.
  • An embodiment of the present application provides a sorting model training device, including: a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above method when executing the instructions.
  • An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .
  • RAM Random Access Memory
  • ROM read only memory
  • EPROM or flash memory erasable Electrically Programmable Read-Only-Memory
  • Static Random-Access Memory SRAM
  • Portable Compression Disk Read-Only Memory Compact Disc Read-Only Memory
  • CD -ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as the “C” language or similar programming languages.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or it can be realized by a combination of hardware and software, such as firmware.
  • hardware such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)
  • firmware such as firmware

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种排序模型训练方法、装置及存储介质,所述方法包括:根据日志数据,确定第一训练集;根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分;根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集;根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。本申请的实施例能够同时消除选择偏置及位置偏置对排序模型的影响,从而得到无偏的排序模型。

Description

排序模型训练方法、装置及存储介质 技术领域
本申请涉及互联网技术领域,尤其涉及一种排序模型训练方法、装置及存储介质。
背景技术
搜索(information retrieval)系统为互联网用户提供快速、高效的信息获取服务,被广泛应用于电子商务、信息流、智慧交通等领域。在搜索场景下,搜索系统通常会记录用户隐式反馈数据,并将用户隐式反馈数据作为训练数据,来训练搜索系统中最为核心的排序模型。
由于搜索场景中存在选择偏置(selection-bias)及位置偏置(position-bias),用于训练排序模型的用户隐式反馈数据,并不能反映用户的真实搜索意图。如果直接将用户隐式反馈数据作为正负样本对排序模型进行训练,得到的排序模型会存在偏差,而且会随着排序模型的不断更新,形成马太效应,导致排序模型越来越偏。然而,现有的纠偏技术并不能同时消除选择偏置及位置偏置对排序模型的影响。
发明内容
有鉴于此,提出了一种排序模型训练方法、装置及存储介质。
第一方面,本申请的实施例提供了一种排序模型训练方法,所述方法包括:根据日志数据,确定第一训练集,所述第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、所述第一样本的位置信息、及所述第一样本与对应的参考查询词之间的第一参考相关性,所述第一样本包括已被用户观测的查询结果;根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分;根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,所述第二训练集包括与所述各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。
在本申请实施例中,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
根据第一方面,在所述排序模型训练方法的第一种可能的实现方式中,所述根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,包括:对于与任一参考查询词对应的未被用户观测的查询结果,通过插补模型,确定所述未被 用户观测的查询结果与所述参考查询词之间的相关性估计值;根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度;在所述置信度大于或等于预设的置信度阈值的情况下,根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将所述相关性估计值确定为所述第二样本与所述参考查询词之间的第二参考相关性;将所述第二样本及所述第二参考相关性,加入第二训练集。
本申请的实施例,能够通过插补模型,确定未被用户观测的查询结果与对应的参考查询词之间的相关性估计值,并根据插补模型的各个层的输出特征,确定相关性估计值的置信度,然后根据相关性估计值的置信度,从未被用户观测的查询结果中,选取出相关性估计值的置信度高的未被用户观测的查询结果,进而确定第二样本及第二参考相关性,并将第二样本及第二参考相关性,加入第二训练集。通过这种方式,建立第二训练集,能够提高第二训练集的可靠性。
根据第一方面的第一种可能的实现方式,在所述排序模型训练方法的第二种可能的实现方式中,所述根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度,包括:分别对所述插补模型的各个层的输出特征进行特征融合,得到与所述各个层对应的融合特征;根据预设权重及所述融合特征,确定所述相关性估计值的置信度。
在本申请的实施例中,能够在确定相关性估计值的过程中,分别对插补模型的各个层的输出特征进行特征融合,得到与各个层对应的融合特征,并根据预设权重及与各个层对应的融合特征,确定该相关性估计值的置信度,从而能够提高相关性估计值的置信度的准确性。
根据第一方面、第一方面的第一种可能的实现方式或者第一方面的第二种可能的实现方式,在所述排序模型训练方法的第三种可能的实现方式中,所述根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,包括:对于任一参考查询词,将所述参考查询词及与所述参考查询词对应的第一样本,输入排序模型,得到第一相关性;将所述参考查询词及与所述参考查询词对应的第二样本,输入所述排序模型,得到第二相关性;根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失;根据所述第一损失,调整所述排序模型。
在本申请的实施例中,在根据第一训练集、逆倾向性得分及第二训练集,训练排序模型时,能够将第一训练集中的第一参考相关性、逆倾向性得分及第二训练集中的第二参考相关性作为无偏的监督信号,对排序模型进行训练,使得排序模型能够对已观测样本及未观测样本进行联合学习,从而消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型。
根据第一方面的第三种可能的实现方式,在所述排序模型训练方法的第四种可能的实现方式中,所述根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失,包括:根据所述第一相关性、与所述第一样本对应的第一参考相关性及所述第一样本的逆倾向性得分,确定第二损失;根 据所述第二相关性及与所述第二样本对应的第二参考相关性,确定第三损失;根据所述第二损失及所述第三损失,确定与所述参考查询词对应的第一损失。
在本申请的实施例中,能够根据第一相关性、与第一样本对应的第一参考相关性及第一样本的逆倾向性得分,确定第二损失,并根据第二相关性及与第二样本对应的第二参考相关性,确定第三损失,进而根据第二损失及第三损失,确定第一损失,使得在确定与参考查询词对应的第一损失时,既考虑与已观测样本(第一样本)对应的损失(第二损失),也考虑与未观测样本(第二样本)对应的损失(第三损失),从而能够提高第一损失的准确性。
根据第一方面的第一种可能的实现方式或者第一方面的第二种可能的实现方式,在所述排序模型训练方法的第五种可能的实现方式中,所述方法还包括:根据所述第一训练集及所述逆倾向性得分,训练所述插补模型。
在本申请的实施例中,根据第一训练集及第一训练集中的各个第一样本的逆倾向性得分,来训练插补模型。通过这种方式,能够使用无偏的第一训练集来训练插补模型,从而能够消除位置偏置对插补模型的影响,进而提高插补模型的估计准确性。
根据第一方面的第五种可能的实现方式,在所述排序模型训练方法的第六种可能的实现方式中,所述根据所述第一训练集及所述逆倾向性得分,训练所述插补模型,包括:对于所述第一训练集中与任一参考查询词对应的第一样本,通过所述插补模型,确定所述第一样本与所述参考查询词之间的第三相关性;根据所述插补模型的各个层的输出特征,计算所述第一样本的不确定性,所述不确定性用于指示所述第一样本的学习难易程度;根据所述第三相关性、所述第一样本与所述参考查询词之间的第一参考相关性、所述第一样本的逆倾向性得分及所述不确定性,确定第四损失;根据所述第四损失,调整所述插补模型。
在本申请的实施例中,训练插补模型时,对于第一训练集中与任一参考查询词对应的第一样本,通过插补模型,确定第一样本与参考查询词之间的第三相关性,并根据插补模型的各个层的输出特征,计算第一样本的不确定性;然后根据第三相关性、第一样本与参考查询词之间的第一参考相关性、第一样本的逆倾向性得分及不确定性,确定第四损失,并根据第四损失,调整插补模型。通过这种方式,能够在插补模型的训练过程中,计算第一样本的不确定性,并在确定插补模型的总体损失(即第四损失)时,考虑第一样本的不确定性,使得插补模型能够对学习难度较高的第一样本进行更好地学习,从而能够提高插补模型的估计准确性。
第二方面,本申请的实施例提供了一种排序模型训练装置,所述装置包括:第一训练集确定模块,用于根据日志数据,确定第一训练集,所述第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、所述第一样本的位置信息、及所述第一样本与对应的参考查询词之间的第一参考相关性,所述第一样本包括已被用户观测的查询结果;逆倾向性得分确定模块,根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分;第二训练集确定模块,用于根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,所述第二训练集包括与所述各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;第一训练模块,用于根据所述第一训练集、所述逆倾向性得分及所 述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。
在本申请实施例中,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
根据第二方面,在所述排序模型训练装置的第一种可能的实现方式中,所述第二训练集确定模块,包括:相关性估计值确定子模块,对于与任一参考查询词对应的未被用户观测的查询结果,通过插补模型,确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值;置信度确定子模块,用于根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度;样本确定子模块,用于在所述置信度大于或等于预设的置信度阈值的情况下,根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将所述相关性估计值确定为所述第二样本与所述参考查询词之间的第二参考相关性;样本加入子模块,用于将所述第二样本及所述第二参考相关性,加入第二训练集。
本申请的实施例,能够通过插补模型,确定未被用户观测的查询结果与对应的参考查询词之间的相关性估计值,并根据插补模型的各个层的输出特征,确定相关性估计值的置信度,然后根据相关性估计值的置信度,从未被用户观测的查询结果中,选取出相关性估计值的置信度高的未被用户观测的查询结果,进而确定第二样本及第二参考相关性,并将第二样本及第二参考相关性,加入第二训练集。通过这种方式,建立第二训练集,能够提高第二训练集的可靠性。
根据第二方面的第一种可能的实现方式,在所述排序模型训练装置的第二种可能的实现方式中,所述置信度确定子模块,用于:分别对所述插补模型的各个层的输出特征进行特征融合,得到与所述各个层对应的融合特征;根据预设权重及所述融合特征,确定所述相关性估计值的置信度。
在本申请的实施例中,能够在确定相关性估计值的过程中,分别对插补模型的各个层的输出特征进行特征融合,得到与各个层对应的融合特征,并根据预设权重及与各个层对应的融合特征,确定该相关性估计值的置信度,从而能够提高相关性估计值的置信度的准确性。
根据第二方面、第二方面的第一种可能的实现方式或者第二方面的第二种可能的实现方式,在所述排序模型训练装置的第三种可能的实现方式中,所述第一训练模块,包括:第一相关性确定子模块,对于任一参考查询词,将所述参考查询词及与所述参考查询词对应的第一样本,输入排序模型,得到第一相关性;第二相关性确定子模块,用于将所述参考查询词及与所述参考查询词对应的第二样本,输入所述排序模型,得到第二相关性;第一损失确定子模块,用于根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失;第一调整子模 块,用于根据所述第一损失,调整所述排序模型。
在本申请的实施例中,在根据第一训练集、逆倾向性得分及第二训练集,训练排序模型时,能够将第一训练集中的第一参考相关性、逆倾向性得分及第二训练集中的第二参考相关性作为无偏的监督信号,对排序模型进行训练,使得排序模型能够对已观测样本及未观测样本进行联合学习,从而消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型。
根据第二方面的第三种可能的实现方式,在所述排序模型训练装置的第四种可能的实现方式中,所述第一损失确定子模块,用于:根据所述第一相关性、与所述第一样本对应的第一参考相关性及所述第一样本的逆倾向性得分,确定第二损失;根据所述第二相关性及与所述第二样本对应的第二参考相关性,确定第三损失;根据所述第二损失及所述第三损失,确定与所述参考查询词对应的第一损失。
在本申请的实施例中,能够根据第一相关性、与第一样本对应的第一参考相关性及第一样本的逆倾向性得分,确定第二损失,并根据第二相关性及与第二样本对应的第二参考相关性,确定第三损失,进而根据第二损失及第三损失,确定第一损失,使得在确定与参考查询词对应的第一损失时,既考虑与已观测样本(第一样本)对应的损失(第二损失),也考虑与未观测样本(第二样本)对应的损失(第三损失),从而能够提高第一损失的准确性。
根据第二方面的第一种可能的实现方式或者第二方面的第二种可能的实现方式,在所述排序模型训练装置的第五种可能的实现方式中,所述装置还包括:第二训练模块,用于根据所述第一训练集及所述逆倾向性得分,训练所述插补模型。
在本申请的实施例中,根据第一训练集及第一训练集中的各个第一样本的逆倾向性得分,来训练插补模型。通过这种方式,能够使用无偏的第一训练集来训练插补模型,从而能够消除位置偏置对插补模型的影响,进而提高插补模型的估计准确性。
根据第二方面的第五种可能的实现方式,在所述排序模型训练装置的第六种可能的实现方式中,所述第二训练模块,包括:第二相关性确定子模块,对于所述第一训练集中与任一参考查询词对应的第一样本,通过所述插补模型,确定所述第一样本与所述参考查询词之间的第三相关性;不确定性计算子模块,用于根据所述插补模型的各个层的输出特征,计算所述第一样本的不确定性,所述不确定性用于指示所述第一样本的学习难易程度;第二损失确定子模块,用于根据所述第三相关性、所述第一样本与所述参考查询词之间的第一参考相关性、所述第一样本的逆倾向性得分及所述不确定性,确定第四损失;第二调整子模块,用于根据所述第四损失,调整所述插补模型。
在本申请的实施例中,训练插补模型时,对于第一训练集中与任一参考查询词对应的第一样本,通过插补模型,确定第一样本与参考查询词之间的第三相关性,并根据插补模型的各个层的输出特征,计算第一样本的不确定性;然后根据第三相关性、第一样本与参考查询词之间的第一参考相关性、第一样本的逆倾向性得分及不确定性,确定第四损失,并根据第四损失,调整插补模型。通过这种方式,能够在插补模型的训练过程中,计算第一样本的不确定性,并在确定插补模型的总体损失(即第四损失)时,考虑第一样本的不确定性,使得插补模型能够对学习难度较高的第一样本进行更 好地学习,从而能够提高插补模型的估计准确性。
第三方面,本申请的实施例提供了一种排序模型训练装置,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的排序模型训练方法。
在本申请实施例中,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的排序模型训练方法。
在本申请实施例中,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
第五方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的排序模型训练方法。
在本申请实施例中,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出根据本申请一实施例的搜索系统的示意图。
图2示出根据本申请一实施例的排序模型训练方法的流程图。
图3示出根据本申请一实施例的第一样本的不确定性的计算过程的示意图。
图4示出根据本申请一实施例的手机应用市场的搜索页面的示意图。
图5示出根据本申请一实施例的排序模型训练方法的训练过程的示意图。
图6示出根据本申请一实施例的排序模型训练装置的框图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
在搜索场景中,搜索系统通常会记录用户隐式反馈数据。用户隐式反馈数据(也称为用户历史行为数据)是指搜索系统记录的、用户进入搜索系统后在搜索系统内进行的查询、浏览、点击、下载等行为的相关数据。用户隐式反馈数据可以看作是用户与搜索系统之间的交互信息。用户隐式反馈数据通常以日志形式进行记录。在相关技术中,通常将用户隐式反馈数据作为训练数据,来训练搜索系统中最为核心的排序模型。
虽然用户隐式反馈数据具有数据量大、时效性强等特点,但由于搜索场景中存在选择偏置及位置偏置,用户隐式反馈数据并不能反映用户的真实搜索意图。
选择偏置是指:用户进入搜索系统后,在搜索框中输入查询词进行搜索,搜索系统将在交互界面(例如屏幕、交互窗口等)显示查询结果,由于交互界面存在大小限制,如果用户不主动进行翻页等操作,将会导致选择偏置,即与用户输入的查询词相关的查询结果未被用户观测,不能作为正样本参与排序模型的训练。其中,未被用户观测是指与用户输入的查询词相关的查询结果未在交互界面显示,得不到曝光,从而未被用户看到。相应地,已被用户观测是指与用户输入的查询词相关的查询结果在交互界面显示(即已曝光),从而已被用户看到。
位置偏置是指:在查询结果列表中,查询结果的位置不同,用户的注意力也会不同,从而导致位置偏置,即用户倾向于与查询结果列表中位置较好的查询结果进行交互,且用户的倾向性与查询结果是否能反应用户的真实搜索意图无关。
因此,如果直接将用户隐式反馈数据作为正负样本对排序模型进行训练,得到的排序模型会存在偏差,而且会随着排序模型的不断更新,形成马太效应,导致排序模型越来越偏。
然而,现有的纠偏技术并不能同时消除选择偏置及位置偏置对排序模型的影响。例如,在一些技术方案中,通过干预实验(例如随机流量等),得到查询结果列表中各个位置的逆倾向性得分(inverse propens i ty score,IPS),然后在排序模型训练过程中,在计算损失(loss)时,对有偏数据集(例如用户隐式反馈数据)中的任一样本,通过除以逆倾向性得分IPS的方式进行加权,以消除位置偏置对排序模型的影响。
但是,由于逆倾向性得分IPS仅在已被用户观测的样本(即已曝光样本)所在的位置上有估计值,未被用户观测的样本(即未曝光样本)没有相应的逆倾向性得分IPS,因此,该方式适用于训练数据仅包括已被用户观测的样本的情形,而且该方式只能消除位置偏置对排序模型的影响,并不能消除选择偏置对排序模型的影响。
在另一些技术方案中,通过已训练的插补模型(imputat ion model),例如已训练的岭山回归模型、随机森林模型等,得到未被用户观测的样本(即未曝光样本)与查询词之间的相关性估计值;然后基于已被用户观测的样本(即已曝光样本)及未被用户观测的样本(即未曝光样本)来训练排序模型。
然而,该方式只能消除选择偏置对排序模型的影响,并不能消除位置偏置对排序模型的影响。此外,未被用户观测的样本中通常存在难以学习的困难样本,通过插补模型得到的困难样本与查询词之间的相关性估计值会不可靠,而以困难样本的不可靠的相关性估计值作为监督信号来训练排序模型,会降低排序模型的可靠性。
为了解决上述技术问题,本申请提供了一种排序模型训练方法,该方法能够根据日志数据,确定第一训练集,第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、第一样本的位置信息、及第一样本与对应的参考查询词之间的第一参考相关性,第一样本包括已被用户观测的查询结果;根据第一训练集中各个第一样本的位置信息,确定各个第一样本的逆倾向性得分;根据与各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,第二训练集包括与各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;然后根据第一训练集、逆倾向性得分及第二训练集,训练排序模型,排序模型用于预测查询词与查询结果之间的相关性。
本申请实施例的排序模型训练方法,能够确定与已被用户观测的查询结果对应的第一训练集、第一训练集中各个样本的逆倾向性得分、及与未被用户观测的查询结果对应的第二训练集,并通过第一训练集、第二训练集及第一训练集中各个样本的逆倾向性得分,来对排序模型进行训练,从而能够同时消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型,进而提高排序模型对查询词与查询结果之间的相关性的预测准确度,提升用户体验。
本申请实施例的排序模型训练方法可用于对各种搜索系统中的排序模型进行训练。搜索系统可例如电子商务中的商品搜索系统、应用市场中的应用(appl icat ion,APP)搜索系统、内容(例如视频、新闻等)搜索场景中的搜索系统等,本申请对搜索系统的具体应用场景不作限制。通过本申请实施例的排序模型训练方法,能够同时消除选择偏置及位置偏置对搜索系统中的排序模型的影响,使得搜索系统反馈的查询结果能够满足用户的真实搜索意图。
在一些实施例中,其他受选择偏置及位置偏置影响的预测模型,例如广告点击率预测模型,也可以通过与本申请实施例的排序模型训练方法类似的训练方法,消除选择偏置及位置偏置的影响。需要说明的是,本申请对排序模型训练方法的具体应用场景不作限制。
图1示出根据本申请一实施例的搜索系统的示意图。如图1所示,搜索系统100包括交互界面110、排序模型120、离线训练模块130、日志数据140及未被用户观测 的查询结果150。
其中,交互界面110为搜索系统100与用户之间的交互界面,例如屏幕、交互窗口等。交互界面110可用于提供搜索框,以便用户输入查询词进行查询。交互界面110还可用于显示搜索系统100的查询结果,以便用户查看。
排序模型120可用于预测用户输入的查询词与查询结果之间的相关性,实现在线预测(online inference)。排序模型120还可基于预测的相关性,对查询结果进行排序,并通过交互界面110显示排序后的查询结果。
在一些示例中,排序模型120可根据用户的个人信息、历史数据以及上下文环境(例如当前时间、节日、节气等)等信息,预测用户输入的查询词与查询结果之间的相关性,以便给出个性化的搜索结果。
离线训练(offline training)模块130用于根据搜索系统100存储的日志数据140及未被用户观测的查询结果150,通过本申请实施例所述的排序模型训练方法,对排序模型120进行离线训练,以消除选择偏置及位置偏置对排序模型120的影响,从而提高排序模型120的预测准确性。
举例来说,假设搜索系统100为电子商务中的商品搜索系统,用户使用搜索系统100时,可通过交互界面110进入搜索系统100,并在交互界面110的搜索框中输入查询词以触发查询请求;查询请求及其相关信息(例如用户的个人信息、历史数据以及上下文环境等)会被输入排序模型120进行在线预测;排序模型120根据查询请求及其相关信息,预测用户输入的查询词与搜索系统100中的商品之间的相关性,并根据预测的相关性,对商品进行降序排列(即将相关性大的商品排在靠前的位置,将相关性不大的商品排在靠后的位置);然后通过交互界面110将不同商品按顺序展示在不同位置,从而得到查询结果并将其展示给用户。
得到查询结果后,用户可在交互界面110上对查询结果进行浏览、点击、下载等操作。搜索系统100可通过日志数据140记录用户的实际操作行为,并存储未被用户观测的查询结果150。
搜索系统100可根据预设周期、日志数据140及未被用户观测的查询结果150,通过离线训练模块130对排序模型120进行离线训练,周期性地更新排序模型120的模型参数,以提高排序模型120的预测准确性。
还例如,假设搜索系统100为手机应用市场中的应用(APP)搜索系统,用户打开手机应用市场,使用搜索系统100时,可在交互界面110提供的搜索框中输入查询词,以触发查询请求;查询请求及其相关信息(例如用户的历史下载记录、用户点击记录、应用的特征、当前时间、地点等信息)会被输入排序模型120进行在线预测;排序模型120根据查询请求及其相关信息,预测用户下载给定的各个候选应用的可能性,并根据预测的可能性,对应用进行降序排列;然后通过交互界面110将各个候选应用按顺序展示在不同位置(将更有可能下载的候选应用排在靠前的位置,将不太可能下载的候选应用排在靠后的位置),从而得到查询结果并将其展示给用户,以达到提高应用下载概率的效果。
得到查询结果后,用户可在交互界面110上对查询结果中的应用进行浏览、点击、下载等操作。搜索系统100可通过日志数据140记录用户的实际操作行为,并存储未 被用户观测的查询结果150。搜索系统100可根据预设周期、日志数据140及未被用户观测的查询结果150,通过离线训练模块130对排序模型120进行离线训练,周期性地更新排序模型120的模型参数,以提高排序模型120的预测准确性。
图2示出根据本申请一实施例的排序模型训练方法的流程图。如图2所示,该排序模型训练方法,包括:
步骤S210,根据日志数据,确定第一训练集。
其中,日志数据为待训练的排序模型所属的搜索系统所记录的日志数据,日志数据包括用户隐式反馈数据。
第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、第一样本的位置信息、及第一样本与对应的参考查询词之间的第一参考相关性。
在一种可能的实现方式中,根据日志数据确定第一训练集时,可首先将日志数据中记录的用户输入的查询词作为参考查询词。确定出参考查询词后,对于任一参考查询词,可从日志数据中,确定出与该参考查询词对应的多条查询结果;对于与该参考查询词对应的任一条查询结果,可从日志数据中,确定出该查询结果的相关信息(例如输入查询词的用户的个人信息、用户历史数据、查询时间、地点等上下文环境信息)、该查询结果在查询结果列表中所在的位置、与该查询结果对应的用户隐式反馈数据;然后将该查询结果及其相关信息,确定为与该参考查询词对应的第一样本,将该查询结果在查询结果列表中所在的位置,确定为第一样本的位置信息,并根据与该查询结果对应的用户隐式反馈数据,确定第一样本与该参考查询值之间的第一参考相关性。
例如,假设与查询结果对应的用户隐式反馈数据中,用户是否点击的取值为1,表示用户已点击,那么可将第一样本与对应的参考查询词之间的第一参考相关性确定为1;假设与查询结果对应的用户隐式反馈数据中,用户是否点击的取值为0,表示用户未点击,那么可将第一样本与对应的参考查询词之间的第一参考相关性确定为0。
由于第一样本包括了已被用户观测的查询结果,那么,可将第一样本看作已观测样本。第一训练集中的第一样本均为已观测样本,也就是说,第一训练集为已观测样本集。
步骤S220,根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分。
可通过现有的相关技术,计算查询结果列表中各个位置的逆倾向性得分IPS。例如,可通过随机流量,计算查询结果列表中各个位置的逆倾向性得分ISP:可将查询结果随机展示在查询结果列表中的不同位置,然后计算用户在不同位置的点击率,并根据该点击率来确定不同位置的逆倾向性得分IPS。
计算出查询结果列表中各个位置的逆倾向性得分IPS后,可根据各个第一样本的位置信息及各个位置的逆倾向性得分IPS,确定各个第一样本的逆倾向性得分IPS。例如,假设第一样本的位置信息为位置9,表示第一样本包括的查询结果在查询结果列表中的位置为第9位,位置9对应的逆倾向性得分IPS为0.15,那么,可将该第一样本的逆倾向性得分IPS确定为0.15。
需要说明的是,本领域技术人员可根据实际情况确定查询结果列表中各个位置的逆倾向性得分的具体计算方式,本申请对此不作限制。
在一种可能的实现方式中,确定出第一训练集中的各个第一样本的逆倾向性得分后,可根据第一训练集及第一训练集中的各个第一样本的逆倾向性得分,训练插补模型。插补模型可用于确定未被用户观测的查询结果与对应的参考查询词之间的相关性估计值。即插补模型可用于预测未被用户观测的查询结果与对应的参考查询词之间的相关性。插补模型可包括一个输入层、至少一个中间层及一个输出层。
通过这种方式,能够使用无偏的第一训练集来训练插补模型,从而能够消除位置偏置对插补模型的影响,进而提高插补模型的估计准确性。
在一种可能的实现方式中,根据第一训练集及第一训练集中的各个第一样本的逆倾向性得分,训练插补模型时,对于第一训练集中与任一参考查询词对应的第一样本,可首先通过插补模型,确定该第一样本与对应的参考查询词之间的第三相关性。第三相关性为估计值。
在确定第三相关性的过程中,可根据插补模型的各个层(包括输入层、中间层及输出层)的输出特征,计算该第一样本的不确定性。第一样本的不确定性可用于指示第一样本的学习难易程度。第一样本的不确定性的值越大,表示第一样本的学习难度越高;第一样本的不确定性的值越小,表示第一样本的学习难度越低。
在一种可能的实现方式中,计算第一样本的不确定性时,可在插补模型对第一样本进行处理以得到第三相关性的过程中,分别获取插补模型的各个层的输出特征,并分别对插补模型的各个层的输出特征进行特征融合(例如线性映射等),得到与各个层对应的融合特征,然后根据预设权重及与各个层对应的融合特征,通过加权求和等方式,得到第一样本的不确定性。
图3示出根据本申请一实施例的第一样本的不确定性的计算过程的示意图。如图3所示,插补模型300包括输入层301、中间层302及输出层303。需要说明的是,图3中的插补模型300作为示例,其仅包括1个中间层302,在实际应用中,插补模型的中间层可以为多个,本申请对插补模型的中间层的具体数量不作限制。
计算第一样本304的不确定性时,可在插补模型300对第一样本304进行处理以得到第三相关性305的过程中,可分别获取输入层301、中间层302及输出层303的输出特征,并分别对输入层301、中间层302及输出层303的输出特征进行特征融合,得到与各个层对应的融合特征:可对输入层301的输出特征进行特征融合,得到与输入层301对应的融合特征309;可对输入层301的输出特征及中间层302的输出特征进行特征融合,得到与中间层302对应的融合特征308;可对输出层303中的输出特征进行特征融合,得到与输出层303对应的融合特征307。
然后可根据预设权重及与各个层对应的融合特征,通过加权求和的方式,得到第一样本304的不确定性:可根据预设权重(图中未示出),对融合特征307、融合特征308及融合特征309进行加权求和,得到第一样本304的不确定性310。
需要说明的是,图3中作为示例,对输入层301、输出层303的输出特征进行融合时,仅使用了对应层的输出特征,对中间层302的输出特征进行特征融合时,不仅使用了中间层302的输出特征,还使用了输入层301的输出特征。以上仅作为示例对特征融合的过程进行示例性地说明,本领域技术人员可根据实际情况设置特征融合时使用的输出特征,本申请对此不作限制。
得到第一样本的不确定性后,可根据第三相关性、第一样本与对应的参考查询词之间的第一参考相关性、逆倾向性得分及不确定性,确定第四损失。例如,可根据第三相关性、第一样本与对应的参考查询词之间的第一参考相关性及逆倾向性得分,确定插补模型的相关性估计损失,然后将相关性估计损失与第一样本的不确定性之和,确定为第四损失,即将第一样本的不确定性作为正则项,加入第四损失中,以强迫插补模型对学习难度较高的第一样本进行更好地学习。
确定出第四损失后,可根据第四损失,调整插补模型。在第四损失满足预设的第一训练结束条件(第一训练结束条件可例如第四损失小于预设的第一损失阈值,或第四损失收敛于一定的区间内)时,则结束训练,得到已训练的插补模型。
通过这种方式,能够在插补模型的训练过程中,计算第一样本的不确定性,并在确定插补模型的总体损失(即第四损失)时,考虑第一样本的不确定性,使得插补模型能够对学习难度较高的第一样本进行更好地学习,从而能够提高插补模型的估计准确性。
步骤S230,根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集。
其中,第二训练集可包括与各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性。
排序模型所属的搜索系统存储的未被用户观测的查询结果与日志数据相对应,因此,可以从该搜索系统存储的未被用户观测的查询结果中,确定出与第一训练集中的各个参考查询词对应的未被用户观测的查询结果。
在一种可能的实现方式中,根据与各个参考查询词对应的未被用户观测的查询结果,确定第二训练集时,对于与任一参考查询词对应的未被用户观测的查询结果,可通过插补模型,确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值。
在确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值的过程中,可根据插补模型的各个层(包括输入层、中间层及输出层)的输出特征,确定该相关性估计值的置信度。相关性估计值的置信度可用于指示相关性估计值的可靠性。相关性估计值的置信度越高,表示相关性估计值的可靠性越高。相关性估计值的置信度越低,表示相关性估计值的可靠性越低。
在一种可能的实现方式中,确定相关性估计值的置信度时,可在确定相关性估计值的过程中,分别获取插补模型的各个层的输出特征,并分别对插补模型的各个层的输出特征进行特征融合(例如进行线性映射等),得到与各个层对应的融合特征,然后根据预设权重及与各个层对应的融合特征,通过加权求和等方式,确定相关性估计值的置信度。通过这种方式,能够提高相关性估计值的置信度的准确性。
需要说明的是,相关性估计值的置信度的计算方式与插补模型训练过程中的第一样本的不确定性的计算方式类似。在插补模型的训练过程中,对插补模型的各个层的输出特征进行特征融合,是为了计算第一样本的不确定性,用于指示第一样本的学习难易程度;而在插补模型的预测/插补过程中,对插补模型的各个层的输出特征进行特征融合,是为了计算相关性估计值的置信度,用于指示相关性估计值的可靠性。
在一种可能的实现方式中,确定出相关性估计值的置信度后,可判断相关性估计值的置信度与预设的置信度阈值的大小关系。在相关性估计值的置信度大于或等于置信度阈值的情况下,可根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将相关性估计值确定为第二样本与所述参考查询词之间的第二参考相关性。这里的第二参考相关性为估计值。
例如,在相关性估计值的置信度大于或等于置信度阈值的情况下,可将所述未被用户观测的查询结果及其相关信息(例如输入查询词的用户的个人信息、用户历史数据、查询时间、地点等上下文环境信息),确定为与所述参考查询词对应的第二样本,并将相关性估计值确定为第二样本与所述参考查询词之间的第二参考相关性。
确定出第二样本及第二样本与所述参考查询词之间的第二参考相关性后,可将该第二样本及该第二参考相关性,加入第二训练集。
由于第二样本包括了未被用户观测的查询结果,那么,可将第二样本看作未观测样本。第二训练集中的第二样本均为未观测样本,也就是说,第二训练集为未观测样本集。
步骤S240,根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型。
其中,排序模型可用于预测查询词与查询结果之间的相关性。
在一种可能的实现方式中,根据第一训练集、逆倾向性得分及第二训练集,训练排序模型时,对于任一参考查询词,可将所述参考查询词及与其对应的第一样本,输入排序模型,得到第一相关性,以及将所述参考查询词及与其对应的第二样本,输入排序模型,得到第二相关性。
然后可根据第一相关性、第二相关性、与第一样本对应的第一参考相关性、与第二样本对应的第二参考相关性及第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失。
在一种可能的实现方式中,确定与所述参考查询词对应的第一损失时,可根据第一相关性、与第一样本对应的第一参考相关性及第一样本的逆倾向性得分,确定第二损失;根据第二相关性及与第二样本对应的第二参考相关性,确定第三损失;然后根据第二损失及第三损失,确定第一损失。通过这种方式,能够在确定与参考查询词对应的第一损失时,既考虑与已观测样本(第一样本)对应的损失(第二损失),也考虑与未观测样本(第二样本)对应的损失(第三损失),从而能够提高第一损失的准确性。
在一种可能的实现方式中,可通过下述公式(1),确定与参考查询词q对应的第一损失L q
Figure PCTCN2022074999-appb-000001
公式(1)中,D 1表示与参考查询词q对应的第一样本的集合;d 1表示D 1中的任一第一样本;c(d 1)表示与d 1对应的第一参考相关性;P(d 1)表示d 1的逆倾向性得分;λ(d 1)表示d 1的排序损失,是根据与d 1对应的第一相关性及与d 1对应的第一参考相关性确定的;D 2表示与参考查询词q对应的第二样本的集合;d 2表示D 2中的任一第二样本,R(d 2)表示与d 2对应的第二参考相关性;λ(d 2)表示d 2的排序损失,是根据与d 2对应的第二相 关性及与d 2对应的第二参考相关性确定的。
在一种可能的实现方式中,确定出与参考查询词对应的第一损失后,可根据该第一损失,调整排序模型。在第一损失满足预设的第二训练结束条件(第二训练结束条件可例如第一损失小于预设的第二损失阈值,或第一损失收敛于一定的区间内)时,则结束训练,得到已训练的排序模型。
通过这种方式,能够在根据第一训练集、逆倾向性得分及第二训练集,训练排序模型时,将第一训练集中的第一参考相关性、逆倾向性得分及第二训练集中的第二参考相关性作为无偏的监督信号,对排序模型进行训练,使得排序模型能够对已观测样本及未观测样本进行联合学习,从而消除选择偏置及位置偏置对排序模型的影响,得到无偏的排序模型。
在一种可能的实现方式中,可使用第一训练集中的第一样本及第二训练集中的第二样本,对排序模型进行交替训练,从而能够提高排序模型的预测准确性。
下面将结合手机应用市场的具体示例以及图4和图5,对本申请实施例的排序模型训练方法进行示例性地说明。
手机应用市场的搜索系统中的排序模型,能够根据用户的个人信息、应用候选集及上下文环境信息,预测用户对应用候选集中的各个应用的点击概率,并按照点击概率,将应用候选集中的各个应用进行降序排列,将最可能被用户下载的应用排在最靠前的位置。
手机应用市场的搜索系统中的排序模型,还能够根据用户输入的查询词、用户的个人信息、应用候选集及上下文环境信息,预测用户输入的查询词与应用候选集中的各个应用的相关性,并按照相关性,将应用候选集中的各个应用进行降序排列,将与用户输入的查询词相关性最大的应用排在最靠前的位置。
下面将以手机应用市场中的用户搜索场景为例进行示例性地说明,手机应用市场中个性化推荐场景与此类似,此处不再赘述。
图4示出根据本申请一实施例的手机应用市场的搜索页面的示意图。如图4所示,手机应用市场的搜索页面400可以看作是手机应用市场的搜索系统与用户的交互界面,其包括位于最上方的搜索框410(当前查询词为“篮球”)以及查询结果显示子页面420。查询结果显示子页面420中,显示了手机应用市场的搜索系统针对查询词“篮球”为用户展示的相关应用APP排序,从上到下依次为应用421、应用422、应用423、应用424及应用425。
用户看到查询结果显示子页面420显示的查询结果后,可根据个人兴趣,进行浏览、点击、下载等操作,该操作会被手机应用市场的搜索系统以日志形式进行记录,同时手机应用市场的搜索系统还会将未在查询结果显示子页面420中显示的查询结果(即未被用户观测的查询结果)进行存储,以便于根据日志数据及未被用户观测的查询结果,对手机应用市场的搜索系统中的排序模型进行训练。
由于手机应用市场的搜索场景中存在选择偏置与位置偏置,可使用本申请实施例的排序模型训练方法,对手机应用市场的搜索系统中的排序模型进行训练,以同时消除选择偏置及位置偏置对该排序模型的影响,从而提高手机应用市场的搜索系统中的排序模型的预测准确性,进而提升用户体验。
图5示出根据本申请一实施例的排序模型训练方法的训练过程的示意图。如图5所示,对手机应用市场的搜索系统中的排序模型进行训练时,其训练过程如下:
在步骤S510中,根据日志数据,确定第一训练集,其中,日志数据为手机应用市场的搜索系统记录的日志数据,第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、第一样本的位置信息、及第一样本与对应的参考查询词之间的第一参考相关性;
在步骤S520中,根据第一训练集中各个第一样本的位置信息,确定各个第一样本的逆倾向性得分;
在步骤S530中,根据第一训练集及各个第一样本的逆倾向性得分,训练插补模型,得到已训练的插补模型;
在步骤S540中,对于与任一参考查询词对应的未被用户观测的查询结果,通过已训练的插补模型,确定未被用户观测的查询结果与参考查询词之间的相关性估计值;
在步骤S550中,根据插补模型的各个层的输出特征,确定相关性估计值的置信度;
在步骤S560中,判断相关性估计值的置信度是否大于或等于预设的置信度阈值;
在相关性估计值的置信度大于或等于预设的置信度阈值的情况下,执行步骤S570,根据未被用户观测的查询结果,确定与参考查询词对应的第二样本,并将相关性估计值确定为第二样本与参考查询词之间的第二参考相关性;
在步骤S580中,将第二样本及第二参考相关性,加入第二训练集;
在步骤S590中,根据第一训练集、各个第一样本的逆倾向性得分及第二训练集,训练排序模型。
通过上述方式,对手机应用市场的搜索系统中的排序模型进行训练,能够同时消除选择偏置及位置偏置该排序模型的影响,得到无偏的排序模型,从而能够提高手机应用市场的搜索系统中的排序模型的预测准确性,进而提升手机应用市场的用户体验。
经实验验证,与现有的搜索系统相比,使用了本申请实施例的训练方法训练的排序模型的搜索系统,其在平均准确率(mean average precision,MAP)指标上的提升为3.5%;在准确率(precision)指标上的提升为:在Precision@5上的提升为3.2%,在Precision@10上的提升为0.1%;在归一化折损累计增益(normalized discounted cumulative gain,NDCG)指标上的提升为:在NDCG@5指标上的提升为4.3%,在NDCG@10上的提升我2.8%;在平均倒数排名(mean reciprocal rank,MRR)指标上的提升为:在MRR@5指标上的提升为3.9%,在MRR@10指标上的提升为2.8%。
因此,与现有的搜索系统相比,使用了本申请实施例的训练方法训练的排序模型的搜索系统的搜索性能,在上述四个典型指标上均得到了显著提升。
图6示出根据本申请一实施例的排序模型训练装置的框图。如图6所示,该排序模型训练装置,包括:
第一训练集确定模块610,用于根据日志数据,确定第一训练集,所述第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、所述第一样本的位置信息、及所述第一样本与对应的参考查询词之间的第一参考相关性,所述第一样本包括已被用户观测的查询结果;
逆倾向性得分确定模块620,用于根据所述第一训练集中各个第一样本的位置信 息,确定所述各个第一样本的逆倾向性得分;
第二训练集确定模块630,用于根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,所述第二训练集包括与所述各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;
第一训练模块640,用于根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。
在一种可能的实现方式中,所述第二训练集确定模块630,包括:相关性估计值确定子模块,对于与任一参考查询词对应的未被用户观测的查询结果,通过插补模型,确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值;置信度确定子模块,用于根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度;样本确定子模块,用于在所述置信度大于或等于预设的置信度阈值的情况下,根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将所述相关性估计值确定为所述第二样本与所述参考查询词之间的第二参考相关性;样本加入子模块,用于将所述第二样本及所述第二参考相关性,加入第二训练集。
在一种可能的实现方式中,所述置信度确定子模块,用于:分别对所述插补模型的各个层的输出特征进行特征融合,得到与所述各个层对应的融合特征;根据预设权重及所述融合特征,确定所述相关性估计值的置信度。
在一种可能的实现方式中,所述第一训练模块640,包括:第一相关性确定子模块,对于任一参考查询词,将所述参考查询词及与所述参考查询词对应的第一样本,输入排序模型,得到第一相关性;第二相关性确定子模块,用于将所述参考查询词及与所述参考查询词对应的第二样本,输入所述排序模型,得到第二相关性;第一损失确定子模块,用于根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失;第一调整子模块,用于根据所述第一损失,调整所述排序模型。
在一种可能的实现方式中,所述第一损失确定子模块,用于:根据所述第一相关性、与所述第一样本对应的第一参考相关性及所述第一样本的逆倾向性得分,确定第二损失;根据所述第二相关性及与所述第二样本对应的第二参考相关性,确定第三损失;根据所述第二损失及所述第三损失,确定与所述参考查询词对应的第一损失。
在一种可能的实现方式中,所述装置还包括:第二训练模块,用于根据所述第一训练集及所述逆倾向性得分,训练所述插补模型。
在一种可能的实现方式中,所述第二训练模块,包括:第二相关性确定子模块,对于所述第一训练集中与任一参考查询词对应的第一样本,通过所述插补模型,确定所述第一样本与所述参考查询词之间的第三相关性;不确定性计算子模块,用于根据所述插补模型的各个层的输出特征,计算所述第一样本的不确定性,所述不确定性用于指示所述第一样本的学习难易程度;第二损失确定子模块,用于根据所述第三相关性、所述第一样本与所述参考查询词之间的第一参考相关性、所述第一样本的逆倾向性得分及所述不确定性,确定第四损失;第二调整子模块,用于根据所述第四损失,调整所述插补模型。
本申请的实施例提供了一种排序模型训练装置,包括:处理器以及用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述方法。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图 和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (17)

  1. 一种排序模型训练方法,其特征在于,所述方法包括:
    根据日志数据,确定第一训练集,所述第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、所述第一样本的位置信息、及所述第一样本与对应的参考查询词之间的第一参考相关性,所述第一样本包括已被用户观测的查询结果;
    根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分;
    根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,所述第二训练集包括与所述各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;
    根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。
  2. 根据权利要求1所述的方法,其特征在于,所述根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,包括:
    对于与任一参考查询词对应的未被用户观测的查询结果,通过插补模型,确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值;
    根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度;
    在所述置信度大于或等于预设的置信度阈值的情况下,根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将所述相关性估计值确定为所述第二样本与所述参考查询词之间的第二参考相关性;
    将所述第二样本及所述第二参考相关性,加入第二训练集。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度,包括:
    分别对所述插补模型的各个层的输出特征进行特征融合,得到与所述各个层对应的融合特征;
    根据预设权重及所述融合特征,确定所述相关性估计值的置信度。
  4. 根据权利要求1-3中任意一项所述的方法,其特征在于,所述根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,包括:
    对于任一参考查询词,将所述参考查询词及与所述参考查询词对应的第一样本, 输入排序模型,得到第一相关性;
    将所述参考查询词及与所述参考查询词对应的第二样本,输入所述排序模型,得到第二相关性;
    根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失;
    根据所述第一损失,调整所述排序模型。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失,包括:
    根据所述第一相关性、与所述第一样本对应的第一参考相关性及所述第一样本的逆倾向性得分,确定第二损失;
    根据所述第二相关性及与所述第二样本对应的第二参考相关性,确定第三损失;
    根据所述第二损失及所述第三损失,确定与所述参考查询词对应的第一损失。
  6. 根据权利要求2或3所述的方法,其特征在于,所述方法还包括:
    根据所述第一训练集及所述逆倾向性得分,训练所述插补模型。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述第一训练集及所述逆倾向性得分,训练所述插补模型,包括:
    对于所述第一训练集中与任一参考查询词对应的第一样本,通过所述插补模型,确定所述第一样本与所述参考查询词之间的第三相关性;
    根据所述插补模型的各个层的输出特征,计算所述第一样本的不确定性,所述不确定性用于指示所述第一样本的学习难易程度;
    根据所述第三相关性、所述第一样本与所述参考查询词之间的第一参考相关性、所述第一样本的逆倾向性得分及所述不确定性,确定第四损失;
    根据所述第四损失,调整所述插补模型。
  8. 一种排序模型训练装置,其特征在于,所述装置包括:
    第一训练集确定模块,用于根据日志数据,确定第一训练集,所述第一训练集包括多个参考查询词、与各个参考查询词对应的第一样本、所述第一样本的位置信息、 及所述第一样本与对应的参考查询词之间的第一参考相关性,所述第一样本包括已被用户观测的查询结果;
    逆倾向性得分确定模块,根据所述第一训练集中各个第一样本的位置信息,确定所述各个第一样本的逆倾向性得分;
    第二训练集确定模块,用于根据与所述各个参考查询词对应的未被用户观测的查询结果,确定第二训练集,所述第二训练集包括与所述各个参考查询词对应的第二样本、及第二样本与对应的参考查询词之间的第二参考相关性;
    第一训练模块,用于根据所述第一训练集、所述逆倾向性得分及所述第二训练集,训练排序模型,所述排序模型用于预测查询词与查询结果之间的相关性。
  9. 根据权利要求8所述的装置,其特征在于,所述第二训练集确定模块,包括:
    相关性估计值确定子模块,对于与任一参考查询词对应的未被用户观测的查询结果,通过插补模型,确定所述未被用户观测的查询结果与所述参考查询词之间的相关性估计值;
    置信度确定子模块,用于根据所述插补模型的各个层的输出特征,确定所述相关性估计值的置信度;
    样本确定子模块,用于在所述置信度大于或等于预设的置信度阈值的情况下,根据所述未被用户观测的查询结果,确定与所述参考查询词对应的第二样本,并将所述相关性估计值确定为所述第二样本与所述参考查询词之间的第二参考相关性;
    样本加入子模块,用于将所述第二样本及所述第二参考相关性,加入第二训练集。
  10. 根据权利要求9所述的装置,其特征在于,所述置信度确定子模块,用于:
    分别对所述插补模型的各个层的输出特征进行特征融合,得到与所述各个层对应的融合特征;
    根据预设权重及所述融合特征,确定所述相关性估计值的置信度。
  11. 根据权利要求8-10中任意一项所述的装置,其特征在于,所述第一训练模块,包括:
    第一相关性确定子模块,对于任一参考查询词,将所述参考查询词及与所述参考查询词对应的第一样本,输入排序模型,得到第一相关性;
    第二相关性确定子模块,用于将所述参考查询词及与所述参考查询词对应的第二样本,输入所述排序模型,得到第二相关性;
    第一损失确定子模块,用于根据所述第一相关性、所述第二相关性、与所述第一样本对应的第一参考相关性、与所述第二样本对应的第二参考相关性及所述第一样本的逆倾向性得分,确定与所述参考查询词对应的第一损失;
    第一调整子模块,用于根据所述第一损失,调整所述排序模型。
  12. 根据权利要求11所述的装置,其特征在于,所述第一损失确定子模块,用于:
    根据所述第一相关性、与所述第一样本对应的第一参考相关性及所述第一样本的逆倾向性得分,确定第二损失;
    根据所述第二相关性及与所述第二样本对应的第二参考相关性,确定第三损失;
    根据所述第二损失及所述第三损失,确定与所述参考查询词对应的第一损失。
  13. 根据权利要求9或10所述的装置,其特征在于,所述装置还包括:
    第二训练模块,用于根据所述第一训练集及所述逆倾向性得分,训练所述插补模型。
  14. 根据权利要求13所述的装置,其特征在于,所述第二训练模块,包括:
    第二相关性确定子模块,对于所述第一训练集中与任一参考查询词对应的第一样本,通过所述插补模型,确定所述第一样本与所述参考查询词之间的第三相关性;
    不确定性计算子模块,用于根据所述插补模型的各个层的输出特征,计算所述第一样本的不确定性,所述不确定性用于指示所述第一样本的学习难易程度;
    第二损失确定子模块,用于根据所述第三相关性、所述第一样本与所述参考查询词之间的第一参考相关性、所述第一样本的逆倾向性得分及所述不确定性,确定第四损失;
    第二调整子模块,用于根据所述第四损失,调整所述插补模型。
  15. 一种排序模型训练装置,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-7中任意一项所述的方法。
  16. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-7中任意一项所述的方法。
  17. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行权利要求1-7中任意一项所述的方法。
PCT/CN2022/074999 2022-01-29 2022-01-29 排序模型训练方法、装置及存储介质 WO2023142042A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/074999 WO2023142042A1 (zh) 2022-01-29 2022-01-29 排序模型训练方法、装置及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/074999 WO2023142042A1 (zh) 2022-01-29 2022-01-29 排序模型训练方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2023142042A1 true WO2023142042A1 (zh) 2023-08-03

Family

ID=87470152

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074999 WO2023142042A1 (zh) 2022-01-29 2022-01-29 排序模型训练方法、装置及存储介质

Country Status (1)

Country Link
WO (1) WO2023142042A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077306A (zh) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 一种搜索引擎的结果排序方法及系统
US20190391982A1 (en) * 2018-06-21 2019-12-26 Yandex Europe Ag Method of and system for ranking search results using machine learning algorithm
CN112084435A (zh) * 2020-08-07 2020-12-15 北京三快在线科技有限公司 搜索排序模型训练方法及装置、搜索排序方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077306A (zh) * 2013-03-28 2014-10-01 阿里巴巴集团控股有限公司 一种搜索引擎的结果排序方法及系统
US20190391982A1 (en) * 2018-06-21 2019-12-26 Yandex Europe Ag Method of and system for ranking search results using machine learning algorithm
CN112084435A (zh) * 2020-08-07 2020-12-15 北京三快在线科技有限公司 搜索排序模型训练方法及装置、搜索排序方法及装置

Similar Documents

Publication Publication Date Title
US10402703B2 (en) Training image-recognition systems using a joint embedding model on online social networks
US10552488B2 (en) Social media user recommendation system and method
US11763145B2 (en) Article recommendation method and apparatus, computer device, and storage medium
US8762318B2 (en) Supplementing a trained model using incremental data in making item recommendations
RU2725659C2 (ru) Способ и система для оценивания данных о взаимодействиях пользователь-элемент
US10147037B1 (en) Method and system for determining a level of popularity of submission content, prior to publicizing the submission content with a question and answer support system
US8412648B2 (en) Systems and methods of making content-based demographics predictions for website cross-reference to related applications
US10083379B2 (en) Training image-recognition systems based on search queries on online social networks
US20160259857A1 (en) User recommendation using a multi-view deep learning framework
CN110717099B (zh) 一种推荐影片的方法及终端
CN110874436B (zh) 用于基于第三方内容的上下文课程推荐的网络系统
US11443005B2 (en) Unsupervised clustering of browser history using web navigational activities
CN114036398B (zh) 内容推荐和排序模型训练方法、装置、设备以及存储介质
EP4092545A1 (en) Content recommendation method and device
CN113569129A (zh) 点击率预测模型处理方法、内容推荐方法、装置及设备
CN113221019A (zh) 基于即时学习的个性化推荐方法和系统
US11308146B2 (en) Content fragments aligned to content criteria
CN112328889A (zh) 推荐搜索词确定方法、装置、可读介质及电子设备
WO2017112053A1 (en) Prediction using a data structure
US10997254B1 (en) 1307458USCON1 search engine optimization in social question and answer systems
US20210264480A1 (en) Text processing based interface accelerating
WO2023142042A1 (zh) 排序模型训练方法、装置及存储介质
CN112231546B (zh) 异构文档的排序方法、异构文档排序模型训练方法及装置
CN112948681B (zh) 一种融合多维度特征的时间序列数据推荐方法
CN112115365B (zh) 模型协同优化的方法、装置、介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22922873

Country of ref document: EP

Kind code of ref document: A1