WO2021081962A1 - 推荐模型的训练方法、推荐方法、装置及计算机可读介质 - Google Patents

推荐模型的训练方法、推荐方法、装置及计算机可读介质 Download PDF

Info

Publication number
WO2021081962A1
WO2021081962A1 PCT/CN2019/114897 CN2019114897W WO2021081962A1 WO 2021081962 A1 WO2021081962 A1 WO 2021081962A1 CN 2019114897 W CN2019114897 W CN 2019114897W WO 2021081962 A1 WO2021081962 A1 WO 2021081962A1
Authority
WO
WIPO (PCT)
Prior art keywords
training sample
training
model
user
recommendation
Prior art date
Application number
PCT/CN2019/114897
Other languages
English (en)
French (fr)
Inventor
张智尧
祝宏
董振华
何秀强
原博文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2019/114897 priority Critical patent/WO2021081962A1/zh
Priority to CN201980093319.4A priority patent/CN113508378A/zh
Priority to EP19949553.2A priority patent/EP3862893A4/en
Priority to US17/242,588 priority patent/US20210248651A1/en
Publication of WO2021081962A1 publication Critical patent/WO2021081962A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence, and in particular to a training method, recommendation method, device, and computer-readable medium of a recommendation model.
  • the prediction of commodity selection rate refers to predicting the probability of a user's choice of a certain commodity in a specific environment. For example, in the recommendation system of applications such as application stores and online advertisements, selection rate prediction plays a key role. In a recommendation, the number of products that can be displayed is much smaller than the total number of products. The recommendation system usually selects products from candidate products for display based on the predicted selection rate.
  • Position bias is a common problem in recommendation and search scenarios.
  • Position offset refers to the offset of the collected training data due to the different positions of the product display.
  • the same application application, APP
  • Selection bias refers to the bias in the collected training data due to the different probabilities of the product being displayed. The ideal training data is obtained when the product is shown to the user with the same display probability. In reality, the products displayed to users are determined based on the selection rate predicted by the previous recommendation model, and the opportunities for the products to be displayed are not the same.
  • a top-ranked APP will increase the user's tendency to download.
  • the selection rate of the top-ranked APP calculated by the recommendation model may be higher than that of other apps, causing the APP to be ranked Before other APPs, it exacerbated the impact of the bias problem, causing the Matthew effect, leading to aggravation of the long tail problem.
  • This application provides a training method, a recommendation method, a device, and a computer-readable medium for a recommendation model, so as to improve the accuracy of the recommendation model.
  • a method for training a recommendation model includes: obtaining at least one first training sample, where the first training sample includes the attribute information of the first user and the information of the first recommended object; The attribute information of the first user and the information of the first recommended object are processed to obtain the interpolation prediction label of the first training sample.
  • the interpolation prediction label is used to indicate whether the first user recommends the first recommended object to the first user A prediction of the operation action for the first recommended object; wherein the model parameters of the interpolation model are obtained by training based on at least one second training sample, and the at least one second training sample includes the attribute information of the second user and the second recommended object The information of the second training sample and the sample label of the second training sample.
  • the sample label of the second training sample is used to indicate whether the second user has an action on the second recommended object.
  • the second training sample is used when the second recommended object is randomly displayed to the first Obtained in the case of two users; use the attribute information of the first user of the first training sample and the information of the first recommended object as the input of the recommendation model, and use the interpolation prediction label of the first training sample as the target of the recommendation model
  • the output value is trained to obtain the recommended model after training.
  • the first recommended object and the second recommended object may be recommended applications in the application market of the terminal device; or, the first recommended object and the second recommended object may be search terms recommended by the system in the search scenario.
  • the first recommended object and the second recommended object may be information recommended by the recommendation system for the user, and the application does not make any limitation on the specific implementation of the first recommended object and the second recommended object.
  • the user's attribute information includes some attributes of the user's personality, such as the user's gender, the user's age, the user's occupation, the user's income, the user's hobbies, and the user's education.
  • the attribute information of the first user may include one or more of the aforementioned attribute information of the user.
  • the attribute information of the second user may include one or more of the aforementioned attribute information of the user.
  • the information of the recommended object includes a recommended object identifier, such as a recommended object ID.
  • the information of the recommended object also includes some attributes of the recommended object, for example, the name of the recommended object, the type of the recommended object, and so on.
  • the information of the first recommended object may include one or more of the above-mentioned information of the recommended object.
  • the information of the second recommended object may include one or more of the above-mentioned information of the recommended object.
  • the user's operation actions on the recommended object may include the user's click behavior, the user's download behavior, the user's purchase behavior, the user's browsing behavior, and the user's negative review behavior.
  • the interpolation model can be used to predict whether the first user has an operation action on the first recommended object when the first recommended object is recommended to the first user.
  • the imputed prediction label can indicate the result of the prediction.
  • the interpolation prediction label may be 0 or 1, that is, 0 or 1 is used to indicate whether the first user has an operation action on the first recommended object.
  • the interpolation prediction label may also be a probability value, that is, the probability value is used to indicate the probability that the first user has an operation action on the first recommended object. This application does not impose any restrictions on the form of the imputed prediction label.
  • the interpolation model may be an ad average click-through rate model, a logistic regression model, a domain-aware factorization machine, or a deep neural network.
  • the recommendation model may be a matrix factorization model, a factorization machine, or a domain-aware factorization machine.
  • the second training sample is obtained when the second recommended object is randomly displayed to the second user.
  • the training sample is not biased, and the second training sample is used to interpolate
  • the training of the interpolation model can avoid the influence of the bias problem on the training of the interpolation model, improve the accuracy of the interpolation model, make the obtained interpolation prediction label more accurate, and then use more accurate interpolation prediction label to recommend Model training can improve the accuracy of the recommended model.
  • the method further includes: obtaining at least one third training sample, the third training sample including attribute information of the third user, information of the third recommended object, and the first
  • the sample label of the third training sample, the sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommended object, and the attribute information of the first user and the information of the first recommended object are used as
  • the input of the recommendation model is trained with the interpolation prediction label of the first training sample as the target output value of the recommendation model, and the trained recommendation model is obtained, including: using the attribute information of the first user and the information of the first recommendation object And the attribute information of the third user and the information of the third recommendation object are used as the input of the recommendation model, and the interpolation prediction label of the first training sample and the sample label of the third training sample are used as the target output value of the recommendation model based on target training The model is trained, and the recommended model after training is obtained.
  • the attribute information of the third user may include one or more of the aforementioned attribute information of the user.
  • the information of the third recommended object may include one or more of the above-mentioned information of the recommended object.
  • the third training sample can be the same as the second training sample or different from the second training sample.
  • the first training sample and the third training sample are used to train the recommendation model, which takes into account the role of the imputation prediction label obtained by the interpolation model and the actual sample label in the training process, and avoids The accuracy of the recommendation model only depends on the accuracy of the imputed prediction label, which further improves the accuracy of the recommendation model.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is not shown to the first user. Obtained when the object is shown to a third user.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, that is, the first training sample does not have feedback information on whether the first user has an operation action on the first recommended object, The first training sample has no actual sample label.
  • the fact that has not occurred can be included in the modeling, and the occurrence
  • the past facts are used together for the training of the recommendation model, that is, the first training sample without sample labels and the third training sample with sample labels are used for the training of the recommendation model, which can make the sample distribution more reasonable and improve the recommendation model. accuracy.
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate the interpolation prediction label of the first training sample
  • the second loss function is used to indicate the difference between the sample label of the third training sample and the predicted label of the third training sample.
  • the model parameters obtained by the target training model are the model parameters of the recommended model after training.
  • the target training model is:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • training sample x 1 to training sample x L are L different third training samples, and training sample x L+1 to training sample for Different first training samples.
  • the second training sample is used to train the interpolation model, that is, the training sample without bias is used to train the interpolation model, and the first loss function and the second loss function are introduced into the target training model.
  • Setting different hyperparameters can adjust the proportion of the first loss function and the second loss function in the target training model, and further improve the accuracy of the recommended model.
  • the model parameters of the interpolation model are obtained by training based on the second training sample.
  • the second training sample is relatively representative, so that the interpolation model can more accurately fit the Partial data distribution results in higher accuracy of the interpolation model.
  • the weight of the second loss function can be higher than the weight of the first loss function, that is, the value of ⁇ can be greater than 1.
  • the interpolation model is selected according to the number of the second training samples.
  • the second training samples are relatively representative.
  • a more complex model or more training features can be used to train the imputation model, so that the imputation model can be Fit unbiased data distributions more accurately.
  • More complex models can be logistic regression models, domain-aware factorization machines, or deep neural networks.
  • the second training sample is relatively unrepresentative.
  • a simpler model or fewer training features can be used to train the imputation model to avoid overfitting and unbiased imputation model Data distribution.
  • a simpler model may be the average click-through rate model of advertisements.
  • the interpolation model when the number of second training samples is more than 100,000, can be a domain-aware factorization machine or a deep neural network; when the number of second training samples is 10,000 to When it is between 100,000, the interpolation model can be a logistic regression model; when the number of the second training samples is less than 10,000, the interpolation model can be a model of the average click-through rate of the advertisement.
  • the interpolation model can be selected according to the number of second training samples, different thresholds can be set for different application scenarios to select the interpolation model, and the interpolation model can be flexibly adjusted , Only a small amount of second training samples can reduce the impact of the bias problem, improve the accuracy of the recommendation model, and avoid large-scale random display of recommended objects due to large-scale collection of second training samples, resulting in a decline in overall system revenue.
  • a recommendation method which includes: obtaining attribute information of a target recommended user and information of a candidate recommended object; inputting the attribute information of the target recommended user and information of the candidate recommended object into a recommendation model, and predicting The probability that the target recommended user has an operation action on the candidate recommended object; wherein the model parameters of the recommended model are obtained by using the attribute information of the first user of the first training sample and the information of the first recommended object as the
  • the input of the recommendation model is obtained by training with the interpolation prediction label of the first training sample as the target output value of the recommendation model; the interpolation prediction label of the first training sample is obtained by using the interpolation model to
  • the attribute information of the first user and the information of the first recommended object are processed, and the interpolation prediction tag is used to indicate whether the first user is recommended when the first recommended object is recommended to the first user
  • the first recommended object has an operation action prediction
  • the model parameters of the interpolation model are obtained by training based on at least one second training sample.
  • the second training sample includes the attribute information of the second user and the second user.
  • the information of the recommended object and the sample label of the second training sample, the sample label of the second training sample is used to indicate whether the second user has an operation action on the second recommended object, and the second training sample is Obtained when the second recommended object is randomly displayed to the second user.
  • the attribute information of the target recommended user includes some personalized attributes of the user, for example, the gender of the target recommended user, the age of the target recommended user, the occupation of the target recommended user, the income of the target recommended user, the hobbies of the target recommended user, and the target recommended user’s Educational situation, etc.
  • the candidate recommendation object information includes candidate recommendation object identification, for example, candidate recommendation object ID.
  • the information of the candidate recommendation object also includes some attributes of the candidate recommendation object, for example, the name of the candidate recommendation object, the type of the candidate recommendation object, and so on.
  • the attribute information of the target recommended user and the information of the candidate recommended object are input into the recommendation model, and the probability that the target recommended user has an operation action on the candidate recommended object is predicted; the model parameters of the recommendation model are determined by using the first
  • the attribute information of the first user of the training sample and the information of the first recommended object are used as the input of the recommendation model, and the interpolation prediction label of the first training sample is used as the target output value of the recommendation model for training,
  • the interpolation model used to obtain the interpolation prediction label is obtained by training based on the training samples without bias, which can avoid the impact of the bias problem on the training of the interpolation model, improve the accuracy of the interpolation model, and make it
  • the interpolation prediction label is more accurate, and then the more accurate interpolation prediction label is used to train the recommendation model, and the accuracy of using the trained recommendation model to predict the probability that the target recommendation user has an operation action on the target recommendation user is higher.
  • the model parameters of the recommendation model are determined by using the attribute information of the first user of the first training sample and the information of the first recommendation object as the recommendation model
  • the input is obtained by training with the interpolation prediction label of the first training sample as the target output value of the recommendation model
  • the model parameters including the recommendation model are the attribute information of the first user and the first user
  • the information of a recommended object and the attribute information of the third user of the third training sample and the information of the third recommended object are used as the input of the recommendation model, and the interpolation prediction label of the first training sample and the value of the third training sample
  • the sample label is obtained based on the target training model as the target output value of the recommendation model, wherein the sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommendation object.
  • the first training sample and the third training sample are used to train the recommendation model, which takes into account the role of the imputation prediction label obtained by the interpolation model and the actual sample label in the training process, and avoids The accuracy of the recommendation model only depends on the accuracy of the interpolation prediction label, which further improves the accuracy of the recommendation model.
  • Using a trained recommendation model to predict the probability that the target recommendation user will have an action on the target recommendation user has a higher accuracy.
  • the first training sample may be obtained when the first recommended object is not shown to the first user
  • the third training sample may be obtained when the third recommended object is not shown to the first user. Obtained when the object is shown to a third user.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, that is, the first training sample does not have feedback information on whether the first user has an operation action on the first recommended object, The first training sample has no actual sample label.
  • the fact that has not occurred can be included in the modeling, and the occurrence
  • the past facts are used together for the training of the recommendation model, that is, the first training sample without sample labels and the third training sample with sample labels are used for the training of the recommendation model, which can make the sample distribution more reasonable and improve the recommendation model.
  • Accuracy using a trained recommendation model to predict the probability that the target recommendation user will have an action on the target recommendation user has a higher accuracy rate.
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate the interpolation prediction label of the first training sample
  • the second loss function is used to indicate the difference between the sample label of the third training sample and the predicted label of the third training sample.
  • the model parameters obtained by the target training model are the model parameters of the recommended model after training.
  • the target training model is:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • training sample x 1 to training sample x L are L different third training samples, and training sample x L+1 to training sample for Different first training samples.
  • the second training sample is used to train the interpolation model, that is, the training sample without bias is used to train the interpolation model, and the first loss function and the second loss function are introduced into the target training model.
  • Setting different hyperparameters can adjust the proportion of the first loss function and the second loss function in the target training model, and further improve the accuracy of the recommended model.
  • the model parameters of the interpolation model are obtained by training based on the second training sample. When the number of second training samples is large, the second training sample is relatively representative, so that the interpolation model can more accurately fit the Partial data distribution results in higher accuracy of the interpolation model.
  • the weight of the second loss function can be higher than the weight of the first loss function, that is, the value of ⁇ can be greater than 1. Only a small number of second training samples can reduce the impact of the bias problem, improve the accuracy of the recommendation model, and avoid large-scale random display of recommended objects due to large-scale collection of second training samples, which will lead to a decline in overall system revenue. Using the recommendation model to predict the probability that the target recommendation user has an operation action on the candidate recommendation object has a higher accuracy.
  • the interpolation model is selected according to the number of the second training samples.
  • the second training samples are relatively representative.
  • a more complex model or more training features can be used to train the imputation model, so that the imputation model can be Fit unbiased data distributions more accurately.
  • More complex models can be logistic regression models, domain-aware factorization machines, or deep neural networks.
  • the second training sample is relatively unrepresentative.
  • a simpler model or fewer training features can be used to train the imputation model to avoid overfitting and unbiased imputation model Data distribution.
  • a simpler model may be the average click-through rate model of advertisements.
  • the interpolation model when the number of second training samples is more than 100,000, can be a domain-aware factorization machine or a deep neural network; when the number of second training samples is 10,000 to When it is between 100,000, the interpolation model can be a logistic regression model; when the number of the second training samples is less than 10,000, the interpolation model can be a model of the average click-through rate of the advertisement.
  • the interpolation model can be selected according to the number of second training samples, different thresholds can be set for different application scenarios to select the interpolation model, and the interpolation model can be flexibly adjusted , Only a small amount of second training samples can reduce the impact of the bias problem, improve the accuracy of the recommendation model, and avoid large-scale random display of recommended objects due to large-scale collection of second training samples, resulting in a decline in overall system revenue.
  • a training device for a recommendation model includes various modules/units for executing the method in the first aspect and any one of the implementation manners in the first aspect.
  • a recommendation device which includes modules/units for executing the method in the second aspect and any one of the implementation manners of the second aspect.
  • a training device for a recommendation model which includes an input and output interface, a processor, and a memory.
  • the processor is used to control the input and output interface to send and receive information
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program from the memory, so that the training device executes any one of the first aspect and the first aspect.
  • the above-mentioned training device may be a terminal device/server, or a chip in the terminal device/server.
  • the aforementioned memory may be located inside the processor, for example, may be a cache in the processor.
  • the above-mentioned memory may also be located outside the processor so as to be independent of the processor, for example, the internal memory (memory) of the training device.
  • a recommendation device which includes an input and output interface, a processor, and a memory.
  • the processor is used to control the input and output interface to send and receive information
  • the memory is used to store a computer program
  • the processor is used to call and run the computer program from the memory, so that the device executes any one of the foregoing second aspect and the second aspect. The method in the way.
  • the foregoing device may be a terminal device/server, or a chip in the terminal device/server.
  • the aforementioned memory may be located inside the processor, for example, may be a cache in the processor.
  • the above-mentioned memory may also be located outside the processor so as to be independent of the processor, for example, the internal memory (memory) of the device.
  • a computer program product comprising: computer program code, which when the computer program code runs on a computer, causes the computer to execute the methods in the above aspects.
  • the above-mentioned computer program code may be stored in whole or in part on a first storage medium, where the first storage medium may be packaged with the processor, or may be packaged separately with the processor. There is no specific limitation.
  • a computer-readable medium stores a program code, and when the computer program code runs on a computer, the computer executes the methods in the above aspects.
  • Fig. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Fig. 2 is an architecture diagram of a recommendation system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 5 is a schematic flowchart of a training method of a recommendation model provided by an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of a training method of a recommendation model provided by another embodiment of the present application.
  • Fig. 7 is a schematic diagram of a recommendation framework provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a recommendation method provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of recommended objects in the application market provided by an embodiment of the present application.
  • Fig. 10 is a schematic block diagram of a training device for a recommendation model provided in an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of a recommendation device provided by an embodiment of the present application.
  • Fig. 12 is a schematic block diagram of a training device for a recommendation model provided by an embodiment of the present application.
  • FIG. 13 is a schematic block diagram of a recommendation device provided by an embodiment of the present application.
  • Fig. 1 shows some application scenarios of the embodiments of the present application.
  • the recommendation method provided in the embodiments of the present application can be applied to all scenarios that require recommendation.
  • the recommendation method provided by the embodiment of the present application can be applied to application market recommendation, music application recommendation, video website recommendation, e-commerce recommendation, search engine ranking, and other scenarios that require recommendation.
  • the two commonly used application scenarios are briefly introduced below.
  • Application scenario 1 Application market recommendation
  • the recommendation system can be used to determine the application to be displayed and the corresponding placement of the application. For example, in a cost per click (CPC) system, advertisers only need to pay when an application is clicked by a user. When a user enters the application market, a recommendation request (request) is triggered. Due to the limited space for application display, when the recommendation system receives a recommendation request, it can sort all the applications to be displayed according to the expected income, and then select the most valuable one or more applications to display in the corresponding Placement. In the CPC system, the expected revenue of each application is related to the estimated click-through rate (CTR) of the application. In this case, CTR can be understood as the probability of each APP being clicked. In order to get the ranking of expected income, the estimated CTR needs to be obtained.
  • CTR estimated click-through rate
  • the estimated CTR of all applications to be displayed is obtained, the expected income of each application is calculated and sorted according to the estimated CTR of each application, and the application to be displayed and the corresponding application are determined according to the sorting result 'S placement.
  • obtaining the estimated CTR of all applications to be displayed can be performed by the recommendation method in the embodiment of the present application.
  • all applications to be displayed can be sorted, and then it can be determined to be displayed according to the sorting result.
  • Applications and corresponding placements are examples of applications and corresponding placements.
  • the search term When a user searches, the search term usually includes two sources: the search term actively entered by the user and the search term recommended to the user by the system.
  • the behavior of the user actively entering the search term is a user behavior that the system cannot intervene.
  • the search term recommended by the system to the user means that when the recommendation system receives a recommendation request, it can calculate the scores of all the search terms to be displayed and sort the scores. For example, the score of the search term can be the click of the search term. According to the ranking results, the search term to be displayed and the corresponding display position of the search term can be determined.
  • calculating the scores of all search words can be performed by the recommendation method in the embodiment of the present application. According to the obtained scores, all the search words to be displayed can be sorted, and then the displayed search words and corresponding search words can be determined according to the sorting result. Placement.
  • a recommendation system refers to a system that uses machine learning algorithms to analyze based on the user's historical data, and predicts new recommendation requests based on the analysis results to obtain recommendation results.
  • FIG. 2 shows an architecture diagram of a recommendation system provided in an embodiment of the present application.
  • a recommendation request is triggered.
  • the recommendation system inputs the recommendation request and related information into the recommendation model to predict the user's selection rate of the products in the system. Further, the products are sorted according to the predicted selection rate or a function based on the selection rate.
  • the recommendation system can use the product to be displayed to the user and the location of the product as a recommendation result for the user according to the sorting result.
  • the user browses the displayed product and may have operational actions, such as browsing behaviors, downloading behaviors, etc.
  • the user's operation actions can be stored in the user behavior log, and the training data can be obtained by preprocessing the user behavior log.
  • the training data can be used to continuously update the parameters of the recommendation model to improve the prediction effect of the recommendation model.
  • a user opens an application market in a smart terminal (for example, a mobile phone) to trigger the recommendation system in the application market, that is, trigger a recommendation request.
  • the recommendation system can predict the user's probability of downloading recommended candidate applications based on the user's historical behavior log, for example, the user's historical download records, and the application market's own characteristics, such as time, location and other environmental characteristics.
  • the recommendation system can display candidate applications in descending order according to the predicted probability and increase the download probability of candidate applications.
  • the recommendation system can select the p candidate applications with the highest predicted probability for display, and select the application with the higher predicted probability among the p candidate applications
  • the program is displayed in the front position, and the application with the lower predicted user selection rate among the p candidate applications is displayed in the back position.
  • Exposure data refers to recorded user browsing behavior data.
  • a single-type model refers to a model in which only one type of data in the training sample is clear.
  • the context information may refer to the background information of the user and/or the recommended object in the recommendation request, such as city, occupation, price, category, and so on.
  • the aforementioned recommendation model may be a neural network model.
  • the following introduces related terms and concepts of neural networks that may be involved in the embodiments of the present application.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be as shown in formula (1):
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit.
  • the activation function is used to perform non-linear transformation of the features in the neural network and convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression: among them, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take the coefficient W as an example, suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the Lth layer is defined as
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • the neural network can use the backpropagation (BP) algorithm to modify the parameter values in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal until the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • FIG. 3 is a schematic diagram of the system architecture of an embodiment of the present application.
  • the system architecture 100 includes an execution device 110, a training device 120, a database 130, a client device 140, a data storage system 150, and a data collection system 160.
  • the execution device 110 includes a calculation module 111, an I/O interface 112, a preprocessing module 113, and a preprocessing module 114.
  • the calculation module 111 may include the target model/rule 101, and the preprocessing module 113 and the preprocessing module 114 are optional.
  • the data collection device 160 is used to collect training data.
  • the recommendation model can be further trained through training data.
  • the training data may include training samples and sample labels of the training samples.
  • the training sample may include the attribute information of the user and the information of the recommended object.
  • the sample label indicates whether the user has an action on the recommended object. Whether the user has an operation on the recommended object can be understood as whether the user in the training sample selects the recommended object.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 trains to obtain the target model/rule 101 based on the training data maintained in the database 130.
  • the training device 120 processes the input user attribute information and recommended object information, and compares the output prediction label with the sample label until the training device The difference between the predicted label output by 120 and the sample label is less than a certain threshold, thereby obtaining a trained recommendation model, that is, the trained recommendation model may be the training of the target model/rule 101.
  • the above-mentioned target model/rule 101 can be used to predict whether the user selects a recommended object or predict the probability of the user selecting a recommended object.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network, a logistic regression model, and the like.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application. Limitations of the embodiment.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 3, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140, so
  • the input data in this embodiment of the application may include: training data input by the client device.
  • the client device 140 here may specifically be a terminal device.
  • the preprocessing module 113 and the preprocessing module 114 are used for preprocessing according to the input data received by the I/O interface 112. In the embodiment of the present application, there may be no preprocessing module 113 and the preprocessing module 114 or only one preprocessing Module. When the preprocessing module 113 and the preprocessing module 114 do not exist, the calculation module 111 can be directly used to process the input data.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the I/O interface 112 provides the processing results to the user.
  • the target model/rule 101 can be used in the recommendation system to predict whether the target recommended user is a candidate recommended object or the probability of selecting a candidate recommended object, and recommend whether the user is a candidate recommendation according to the target The object or the probability of selecting the candidate recommended object obtains the recommendation result, and presents it to the client device 140 to provide it to the user.
  • the above-mentioned recommendation result may be a recommendation ranking of candidate recommendation objects obtained according to the probability of the target recommendation user selecting the candidate recommendation object, or the above-mentioned recommendation result may be the probability of selecting the candidate recommendation object according to the target recommendation user
  • the obtained target recommendation object, the target recommendation object may be one or more candidate recommendation objects with the highest probability.
  • the calculation module 111 may also transmit the processed commodities with a higher ranking to the I/O interface, and then the I/O interface The products with a higher ranking are sent to the client device 140 for display.
  • the training device 120 can generate corresponding target models/rules 101 based on different training samples for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the above-mentioned goals.
  • the above tasks provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the recommendation model in the embodiment of the present application may also be a logistic regression (logistic regression) model.
  • the logistic regression model is a machine learning method used to solve classification problems and can be used to estimate the possibility of a certain thing.
  • the recommended model may be a deep factorization machine (deep factorization machines, DeepFM) model, or the recommended model may be a wide and deep model.
  • DeepFM deep factorization machines
  • FIG. 4 shows that an embodiment of the present application provides a system architecture 200 that applies the recommendation model training method and recommendation method of the embodiment of the present application.
  • the system architecture 200 may include a local device 220, a local device 230, an execution device 210 and a data storage system 250, where the local device 220 and the local device 230 are connected to the execution device 210 through a communication network.
  • the execution device 210 is implemented by one or more servers, and optionally, it cooperates with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 210 can be arranged on one physical site or distributed in multiple On the physical site.
  • the execution device 210 can use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the training method and the recommendation method of the recommendation model in the embodiment of the present application.
  • the data storage system 250 may be deployed in the local device 220 or the local device 230.
  • the data storage system 250 may be used to store training samples.
  • execution device 210 may also be referred to as a cloud device, and in this case, the execution device 210 may be deployed in the cloud.
  • the execution device 210 may execute the following process: obtain at least a first training sample, where the first training sample includes the attribute information of the first user and the information of the first recommended object; The attribute information and the information of the first recommended object are processed to obtain the interpolation prediction label of the first training sample, and the interpolation prediction label is used to indicate when the first recommended object is recommended to the first user , Whether the first user has a prediction of an operation action on the first recommended object; wherein the model parameters of the interpolation model are obtained by training based on at least one second training sample, and the second training sample includes The attribute information of the second user, the information of the second recommended object, and the sample label of the second training sample, where the sample label of the second training sample is used to indicate whether the second user has an operation action on the second recommended object , The second training sample is obtained when the second recommended object is randomly displayed to the second user; using the attribute information of the first user of the first training sample and the The information of the first recommendation object is used as the input of the recommendation model, and the interpol
  • the execution device 210 can train to obtain a recommendation model by executing the above-mentioned process, and the recommendation model can eliminate the influence of training data bias on the recommendation accuracy rate, and more accurately predict the probability that the target recommendation user has an operation action on the candidate recommendation object.
  • the training method executed by the execution device 210 may be a training method executed in the cloud.
  • the user can operate respective user devices (for example, the local device 220 and the local device 230) to interact with the execution device 210.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • the local device of each user can interact with the execution device 210 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device 220 and the local device 230 can obtain the relevant parameters of the recommendation model from the execution device 210.
  • the recommendation model is on the local device 220 and the local device 230, and the recommendation model is used to predict the target recommendation user’s recommendation to the candidate. The probability that the object has an action.
  • the recommendation model can be directly deployed on the execution device 210.
  • the execution device 310 obtains the data to be processed from the local device 220 and the local device 230, and obtains the prediction target according to the recommendation model.
  • the user recommends that the user has an operation action on the candidate recommendation object. The probability.
  • the data storage system 250 may be deployed in the local device 220 or the local device 230 for storing training samples of the local device.
  • the data storage system 250 may be independent of the local device 220 or the local device 230 and be separately deployed on a storage device.
  • the storage device may interact with the local device to obtain user behavior logs in the local device and store it in the storage device.
  • FIG. 5 shows a method 300 for training a recommendation model according to an embodiment of the present application.
  • the method 300 includes step 310 to step 330.
  • the training method 300 can be executed by the training device 120 in FIG. 3. Step 310 to step 330 are described in detail below.
  • the first training sample may be data obtained in the data storage system 250 as shown in FIG. 4.
  • the attribute information of the first user and the information of the first recommended object may be obtained through the context information of the first training sample.
  • the user's attribute information may include some attributes of the user's personality, such as the user's gender, the user's age, the user's occupation, the user's income, the user's hobbies, and the user's education.
  • the attribute information of the first user may include one or more of the aforementioned attribute information of the user.
  • the recommended object may be a recommended application in the application market of the terminal device in the foregoing application scenario 1, or the recommended object may be a search term recommended by the system in the foregoing application scenario 2.
  • the recommended object may be information that the recommendation system can recommend for the user, and the application does not make any restrictions on the specific implementation of the recommended object.
  • the first recommended object may be one of the aforementioned recommended objects.
  • the information of the recommended object may include a recommended object identifier, for example, a recommended object ID.
  • the information of the recommended object may also include some attributes of the recommended object, for example, the name of the recommended object, the type of the recommended object, and so on.
  • the recommended object may be the recommended application in the application market of the terminal device in the aforementioned application scenario 1, and the information of the recommended object may be the information of the recommended application.
  • the information of the recommended application may include the identification of the recommended application, for example, the id of the recommended application.
  • the information of the recommended application may also include some attributes of the recommended application, for example, the name of the recommended application, the developer of the recommended application, the type of the recommended application, the installation package size of the recommended application, and the score of the recommended application , Comments on recommended apps, etc.
  • the information of the first recommended object may include one or more of the above-mentioned information of the recommended object.
  • the third training sample includes the attribute information of the third user, the information of the third recommended object, and the sample label of the third training sample, and the sample label of the third training sample is used to indicate the first 3. Whether the user has an operation action on the third recommended object.
  • step 311 is an optional step.
  • the third training sample may be data obtained in the data storage system 250 as shown in FIG. 4.
  • the attribute information of the third user and the information of the third recommended object may be obtained through the context information of the third training sample.
  • the first user and the third user may be the same user or different users.
  • the attribute information of the third user may include one or more items of the attribute information of the user described in step 310.
  • the third recommended object may be one of the recommended objects described in step 310.
  • the information of the third recommended object may include one or more of the above-mentioned information of the recommended object.
  • the attribute categories of the recommended objects included in the information of the first recommended object and the information of the third recommended object may be the same or different.
  • the information of the first recommended object may include the name of the first recommended object and the type of the first recommended object.
  • the information of the third recommended object may include the name of the third recommended object.
  • the label can be used to mark the training sample as a positive or negative sample.
  • the label can be 0 or 1
  • the label of a positive sample can be 1
  • the label of a negative sample can be 0.
  • the label may also be a specific value, that is, the probability that the training sample is a positive sample or a negative sample is marked by the specific value.
  • the sample label can be obtained based on whether the user has an operation action on the recommended object.
  • the user's action on the recommended object may include the user's click behavior, the user's download behavior, the user's purchase behavior, the user's browsing behavior, and the user's negative review behavior.
  • the sample label is obtained based on whether the user has an operation action on the recommended object, and can specifically include the following situations.
  • Case 1 The user has an action on the recommended object, the sample label can be 1, and the user has no action on the recommended object, and the sample label can be 0.
  • the operation action may be a download behavior.
  • the training sample A1 when the user in the training sample A1 has a download behavior for the recommended object in the training sample A1, the training sample A1 is a positive sample, and the sample label of the training sample A1 can be 1; when the user in the training sample A1 The recommended object in A1 has no download behavior, then the training sample A1 is a negative sample, and the sample label of the training sample A1 can be 0.
  • the training sample A1 is an example of the third training sample.
  • Case 2 The user has an action on the recommended object, the sample label can be 0, and the user has no action on the recommended object, the sample label can be 1.
  • the operation action may be a bad review behavior.
  • the training sample A1 when the user in the training sample A1 has negative comments on the recommended object in the training sample A1, the training sample A1 is a negative sample, and the sample label of the training sample A1 can be 0; when the user in the training sample A1 The recommended object in the sample A1 has no bad review behavior, then the training sample A1 is a positive sample, and the sample label of the training sample A1 can be 1.
  • the training sample A1 is an example of the third training sample.
  • Case 3 The user has the first type of operation action on the recommended object, and the sample label can be 1, and the user has the second type of operation action on the recommended object, and the sample label can be 0.
  • the first type of operation action may include a purchase behavior and the like
  • the second type of operation action may include a browsing behavior and the like.
  • the training sample A1 when the user in the training sample A1 has a browsing behavior on the recommended object in the training sample A1, the training sample A1 is a negative sample, and the sample label of the training sample A1 can be 0; when the user in the training sample A1 The recommended object in A1 has a purchase behavior, then the training sample A1 is a positive sample, and the sample label of the training sample A1 can be 1.
  • the training sample A1 is an example of the third training sample.
  • the sample label corresponding to the operation action can be determined according to the specific application scenario.
  • the first type of operation actions may include browsing behaviors, etc.
  • the second type of operation actions may include negative reviews behaviors, and the like.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user .
  • the feedback information can be obtained from the user behavior log.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, that is, the first training sample does not have feedback information on whether the first user has an operation action on the first recommended object, The first training sample has no actual sample label.
  • Training samples can be obtained from a recommendation request and the recommendation object corresponding to the recommendation request.
  • the training sample includes the attribute information of the user in the recommendation request and the information of the recommendation object.
  • the recommendation system recommends applications to users.
  • the training sample A1 (training sample A1 is an example of the third training sample) can include four types of attribute data: the gender of the user in the training sample A1, the occupation of the user, the id of the recommended application in the training sample A1, and the type of the recommended application. , That is, data of 4 fields, it can also be understood that the training sample A1 includes 4 training features.
  • the domain represents the category of the attribute. For example, Chengdu, Chongqing, and Beijing all belong to the same field, which is also a city.
  • the 4 types of attributes are numbered from 0 to 3.
  • the sample label is 1.
  • the sample label may indicate whether the user has downloaded the recommended application, and a sample label of 1 may indicate that the user has downloaded WeChat.
  • the recommendation system recommends music to the user.
  • the recommended music can be music that requires payment.
  • the training sample A2 (training sample A2 is an example of the third training sample) can include the gender of the user in the training sample A2, the age of the user, the recommended music id in the training sample A2, the type of recommended music, and the score of the recommended music.
  • Class attribute data that is, 5 fields. The five types of attributes are numbered from 0 to 4.
  • This is an original training sample.
  • the first training sample and the third training sample are described by taking the recommended target as a recommended application in the application market as an example.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user.
  • the first recommended object may be a recommended application that has not been placed (or is not shown to the first user).
  • the third recommended object may be a recommended application that has been placed (or displayed to a third user). For example, in response to a recommendation request, one or more recommended applications among the candidate recommended applications are displayed to the user corresponding to the recommendation request, so that feedback information on whether the user has an operation action on the recommended application can be obtained.
  • the recommended application A displayed to the user A corresponding to the recommendation request is the third recommended object
  • the training sample with feedback information about whether the user A has an operation action on the recommended application A is the third training sample. That is to say, the third training sample includes the attribute information of user A, the information of recommending application A, and the sample label of the third training sample.
  • the recommended application B that is not shown to the user A cannot obtain the feedback information of whether the user A has an operation action on the recommended application B.
  • the recommended application B that is not displayed to the user A corresponding to the recommendation request is the first recommendation object
  • the training sample that does not have feedback information about whether the user A has an operation action on the recommended application B is the first training sample.
  • the first training sample includes the attribute information of user A and the information of recommended application B. It should be understood that the above description only takes the first training sample and the third training sample corresponding to the same recommendation request as an example for description, and both the first user and the third user are user A as an example.
  • the number of candidate recommended applications is m
  • the number of recommended applications displayed to the user corresponding to the recommendation request is n
  • the number of recommended applications not displayed to the user corresponding to the recommendation request is mn A.
  • the n recommended applications can correspond to n third training samples, that is, n third training samples can be constructed from the recommendation request and the n recommended applications.
  • the m-n recommended applications can correspond to m-n first training samples, that is, m-n first training samples can be constructed from the recommendation request and the m-n recommended applications.
  • the interpolation prediction label is used to indicate that the first recommendation object is recommended to the first user.
  • the first user predicts whether the first recommended object has an operation action.
  • the model parameters of the interpolation model are obtained by training based on at least one second training sample.
  • the second training sample includes the attribute information of the second user, the information of the second recommended object, and the sample label of the second training sample.
  • the sample label of the training sample is used to indicate whether the second user has an operation action on the second recommended object, and the second training sample is obtained when the second recommended object is randomly displayed to the second user.
  • the interpolation model can be used to predict whether the first user has an operation action on the first recommended object when the first recommended object is recommended to the first user.
  • the interpolation prediction label may be 0 or 1, that is, 0 or 1 is used to indicate whether the first user has an operation action on the first recommended object.
  • the interpolation prediction label may also be a probability value, that is, the probability value is used to indicate the probability that the first user has an operation action on the first recommended object.
  • the interpolation model may be an advertising average CTR model, a logistic regression (LR) model, a field-aware factorization machine (FFM), or a DNN, etc.
  • LR logistic regression
  • FFM field-aware factorization machine
  • the second training sample may be data acquired in the data storage system 250 as shown in FIG. 4.
  • the attribute information of the second user and the information of the second recommended object may be obtained through the context information of the second training sample.
  • the first user and the second user may be the same user or different users.
  • the attribute information of the second user may include one or more of the attribute information of the user described in step 310.
  • the second recommended object may be one of the recommended objects described in step 310.
  • the information of the second recommended object may include one or more of the above-mentioned information of the recommended object.
  • the attribute categories of the recommended objects contained in the information of the first recommended object and the information of the second recommended object may be the same or different.
  • the information of the first recommended object may include the name of the first recommended object and the type of the first recommended object.
  • the information of the second recommended object may include the name of the second recommended object.
  • sample label of the second sample can be as described in step 311, which will not be repeated here.
  • the second training sample can be obtained when the second recommended object is shown to the second user, that is to say, the second training sample contains feedback on whether the second user has an operation action on the second recommended object. Information, the second training sample has the actual sample label.
  • the second training sample may be the same as the third training sample in step 311, or may be different from the third training sample.
  • the second training sample is a training sample without bias.
  • the third training sample may be a training sample without bias or a training sample with bias.
  • Biased training samples can be understood as being obtained when the recommended object is displayed to the user according to certain rules. For example, when a recommendation request is received, the candidate recommendation objects are sorted according to expected income, and the recommended objects displayed to the user are determined according to the ranking, that is to say, in this case, the probability of each recommended object being displayed to the user is different , The recommended object with higher expected income is more likely to be shown to the user, and the training sample obtained in this case is the biased training sample.
  • the following describes the training samples without bias and the training samples with bias by taking the recommended target as the recommended application in the application market as an example.
  • the recommended application is displayed through a random placement strategy, that is, the recommended application among multiple candidate recommended applications is randomly displayed to the user corresponding to the recommendation request, and each recommended application is displayed to the corresponding recommendation request
  • the probability of the user is the same, and the training sample obtained in this case is the training sample without bias.
  • the recommendation programs in the candidate recommendation applications are sorted according to expected income, and the recommended applications to be displayed to the user are determined according to the ranking. In this case, the training samples obtained are biased training samples.
  • Using the second training sample to train the imputation model that is, using the training sample without bias to train the interpolation model can avoid the impact of the bias problem on the training of the interpolation model, improve the accuracy of the interpolation model, and make it Imputation prediction labels are more accurate.
  • the interpolation model can be selected according to the number of second training samples.
  • the second training samples are relatively representative.
  • a more complex model or more training features can be used to train the imputation model, so that the imputation model can be Fit unbiased data distributions more accurately.
  • More complex models can be logistic regression models, domain-aware factorization machines, or deep neural networks.
  • the second training sample is relatively unrepresentative.
  • a simpler model or fewer training features can be used to train the imputation model to avoid overfitting and unbiased imputation model Data distribution.
  • a simpler model may be the average click-through rate model of advertisements.
  • the interpolation model when the number of second training samples is more than 100,000, can be a domain-aware factorization machine or a deep neural network; when the number of second training samples is 10,000 to When it is between 100,000, the interpolation model can be a logistic regression model; when the number of second training samples is less than 10,000, the interpolation model can be a model of the average click-through rate of the advertisement.
  • the interpolation model can be selected according to the number of second training samples, different thresholds can be set for different application scenarios to select the interpolation model, and the interpolation model can be flexibly adjusted , Only a small amount of second training samples can reduce the impact of the bias problem, improve the accuracy of the recommendation model, and avoid large-scale random display of recommended objects due to large-scale collection of second training samples, resulting in a decline in overall system revenue.
  • the training feature may be a feature obtained from the attribute information of the user and the information of the recommended object.
  • step 330 may be training according to the first training sample and the third training sample to obtain a recommendation model.
  • training based on the first training sample and the third training sample may be based on the attribute information of the first user and the information of the first recommended object, and the attribute information of the third user and the information of the third recommended object.
  • the interpolation prediction label of the first training sample and the sample label of the third training sample are used as the target output value of the recommendation model to train based on the target training model to obtain the trained recommendation model.
  • the above-mentioned training process uses the attribute information of the first user and the information of the first recommended object as the input of the recommendation model, the interpolation prediction label of the first training sample is used as the target output value corresponding to the input, and the third The attribute information of the user and the information of the third recommended object are used as the input of the recommendation model, and the sample label of the third training sample is used as the target output value corresponding to the input, and training is performed based on the target training model.
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate the difference between the interpolation prediction label of the first training sample and the prediction label of the first training sample
  • the second loss The function is used to indicate the difference between the sample label of the third training sample and the predicted label of the third training sample.
  • training the recommendation model based on the target training model may be through multiple iterations of the backpropagation algorithm to continuously reduce the first loss function and the second loss function to obtain the model parameters of the recommendation model.
  • first loss function and the second loss function may have an additive relationship.
  • the relationship between the first loss function and the second loss function may also be a multiplication.
  • the target training model can be:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to the training sample Is the first training sample, Represents the number of training samples in the training set, L represents a third number of training samples in the training set, ⁇ l represents the training sample x l interpolation prediction label ⁇ (x l), y l x l represents the training samples Sample label, Represents the predicted label of the training sample x l, Represents the second loss function,
  • the first loss function, ⁇ is a hyperparameter, used for the proportion of the first loss function and the second loss function.
  • training sample x 1 to training sample x L are L different third training samples, and training sample x L+1 to training sample for Different first training samples.
  • step 320 may also process the third training sample through the interpolation model to obtain the interpolation prediction label of the third training sample, and then use the third training sample and the first training sample as the input samples of the recommendation model to The sample label of the third training sample, the interpolation prediction label of the third training sample, and the interpolation prediction label of the first training sample are used as the target output value of the recommendation model to train the recommendation model based on the target training model.
  • the target training model can be:
  • the multiplication of the hyperparameter and the second loss function is only for illustration. That is to say, in the target training model, the hyperparameters can also be set before the first loss function, that is, the hyperparameters can be multiplied by the first loss function. Or, in the target training model, two hyper-parameters can be set, and the two hyper-parameters are respectively set before the first loss function and before the second loss function.
  • the method of training based on the target training model can be called non-inclination Shuanglu The propensity-free doubly robust method.
  • the proportion of the first loss function and the second loss function in the target training model can be adjusted, and the accuracy of the recommended model can be further improved.
  • the weight of the second loss function may be higher than the weight of the first loss function, that is, the value of ⁇ may be greater than 1.
  • the weight of the second loss function may be higher than the weight of the first loss function, that is, the value of ⁇ may be greater than 1. In this way, training the recommendation model based on the target training model can improve the accuracy of the recommendation model.
  • the recommendation model may be a low-rank model.
  • the recommendation model may be a matrix factorization (MF) model, a factorization machine (FM), or FFM.
  • the training sample can be decomposed into two parts, the attribute information of the user in the recommendation request and the information of the recommended object in the process of solving, which helps to reduce the time complexity of calculating the training sample.
  • the training features used for the interpolation model may be different from the training features used for the recommendation model.
  • the training feature can be determined from the attribute information of the user and the information of the recommended object.
  • the attribute information of the user may include the gender of the user and the occupation of the user
  • the information of the recommended object may include the type of the recommended object, the scoring of the recommended object, and the comment of the recommended object.
  • the training characteristics used for the interpolation model may include the type of the recommended object and the gender of the user.
  • the training features for the recommendation model may include the type of the recommended object, the scoring of the recommended object, the comment of the recommended object, the gender of the user, and the occupation of the user.
  • the second training sample is a training sample without bias
  • the second training sample is used to train the interpolation model, which can avoid the impact of the bias problem on the training of the interpolation model and improve the interpolation.
  • the accuracy of the complement model makes the obtained interpolation prediction label more accurate, and the more accurate interpolation prediction label is used to train the recommendation model, which can improve the accuracy of the recommendation model.
  • the first training sample and the third training sample are used to train the recommendation model, which takes into account the role of the imputation prediction label obtained by the imputation model and the actual sample label in the training process, avoiding the accuracy of the recommendation model. Relying on the accuracy of the imputed prediction label to further improve the accuracy of the recommendation model.
  • the fact that has not happened can be included in the modeling, and the fact that has happened can be used in the recommendation model together
  • the training that is, the first training sample without sample label and the third training sample with sample label are used for the training of the recommendation model, which can make the sample distribution more reasonable and improve the accuracy of the recommendation model.
  • the sample label corresponding to the first training sample cannot be obtained according to the user's operation action, and the first training sample cannot be used to train the recommendation model.
  • Incorporating the first training sample into the modeling is to use the counterfactual learning method to train the recommendation model.
  • Counterfactual learning refers to the method of representing facts that have not occurred in the past and incorporating them into the modeling process.
  • the first training sample can be understood as a fact that has not happened in the past.
  • the recommended objects that have not been shown to the user are included in the training sample to make the sample distribution more reasonable, and then to train the recommendation model, reduce the impact of the bias problem, and improve the accuracy of the recommendation model .
  • FIG. 6 shows a method 400 for training a recommendation model provided by an embodiment of the present application.
  • the method 400 includes steps 410 to 440. Steps 410 to 440 will be described in detail below. It should be understood that the specific implementation of step 410 to step 440 may refer to the foregoing method 300. To avoid unnecessary repetition, the repetitive description is appropriately omitted when the method 400 is introduced below.
  • first training samples There may be multiple first training samples, second training samples, and third training samples.
  • the first training sample includes the attribute information of the first user and the information of the first recommended object.
  • the second training sample includes the attribute information of the second user, the information of the second recommended object, and the sample label of the second training sample.
  • the sample label of the second training sample is used to indicate whether the second user has an operation action on the second recommended object,
  • the second training sample is obtained when the second recommended object is randomly displayed to the second user.
  • the third training sample includes the attribute information of the third user, the information of the third recommended object, and the sample label of the third training sample. The sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommended object.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user.
  • the above-mentioned multiple second training samples may be a part of multiple third training samples. That is to say, the multiple third training samples may include training samples without bias and training samples with bias.
  • the target training model can be:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • the second training sample is used to train the interpolation model, that is, the training sample without bias is used to train the interpolation model, which can avoid the impact of the bias problem on the training of the interpolation model and improve the interpolation.
  • the accuracy of the complement model makes the interpolation prediction label more accurate.
  • the sample label of the first training sample cannot be obtained.
  • the interpolation model is used to supplement the corresponding interpolation prediction label for the first training sample, and the recommended objects that are not shown to the user are included in the training sample, that is, the counterfactual learning method is used to incorporate the facts that did not occur into the modeling to recommend the model Perform training to make the sample distribution more reasonable.
  • the recommendation model is trained using the tendency-free double robust method. Only a small amount of second training samples can reduce the impact of the bias problem, improve the accuracy of the recommendation model, and avoid the large-scale collection of second training samples.
  • the recommended objects are randomly displayed on the scale, which leads to a decrease in the overall revenue of the system.
  • NLL negative logarithmic loss
  • ROC receiver operating characteristic
  • the first training sample and the second training sample can also be obtained, where the first training sample is the sample for label prediction through the interpolation model.
  • the supplementary prediction model is generated by pre-training, and its training method is the same as the training method in the foregoing embodiment, and will not be repeated here.
  • the recommended model is obtained by training based on the first training sample and the third training sample.
  • FIG. 7 shows a schematic diagram of a recommendation framework 500 provided by an embodiment of the present application.
  • the recommendation framework 500 includes an interpolation module 501 and a recommendation module 502.
  • the interpolation module can be used to process the training samples without sample labels to obtain the interpolation prediction labels, and the training samples without sample labels are included in the modeling, so that the sample distribution is more reasonable, and the bias problem is eliminated.
  • a more accurate recommendation module 502 is obtained.
  • interpolation module 501 may correspond to the interpolation model in FIG. 5 or FIG. 6, and the recommendation module 502 may correspond to the recommendation model in FIG. 5 or FIG. 6.
  • the interpolation module 501 may be used to supplement the interpolation prediction label for training samples without sample labels.
  • the recommendation module 502 may be used to predict the probability that the user in the training sample has an operation action on the recommended object in the training sample.
  • the recommendation framework 500 can be divided into two stages, a training stage and a recommendation stage.
  • the training phase and the recommendation phase are described below.
  • Step A-1 Obtain at least one first training data and at least one second training sample.
  • the first training sample includes the attribute information of the first user and the information of the first recommended object.
  • Step A-2 Process the attribute information of the first user and the information of the first recommended object through the interpolation module 601 to obtain the interpolation prediction label of the first training sample, and the interpolation prediction label is used to indicate the recommendation to the first user When the first recommended object is the first recommended object, whether the first user has an operation action prediction on the first recommended object.
  • the parameters of the interpolation module 601 are obtained by training based on the second training sample.
  • the second training sample includes the attribute information of the second user, the information of the second recommended object, and the sample label of the second training sample.
  • the sample of the second training sample The label is used to indicate whether the second user has an operation action on the second recommended object, and the second training sample is obtained when the second recommended object is randomly displayed to the second user.
  • Step A-3 Obtain at least one third training sample. This step is optional.
  • the third training sample includes the attribute information of the third user, the information of the third recommended object, and the sample label of the third training sample.
  • the sample label of the third training sample is used to indicate whether the third user has an operation on the third recommended object action.
  • Step A-4 Use the attribute information of the first user of the first training sample and the information of the first recommended object as the input of the recommendation model, and use the interpolation prediction label of the first training sample as the recommendation
  • the target output value of the model is trained to obtain the recommendation module 502.
  • step A-4 may be training based on the target training model according to the first training sample and the third training sample, to obtain the recommendation module 502.
  • the target recommendation model may be the target recommendation model in step 330 or step 440, which will not be repeated here.
  • the above-mentioned interpolation module may be an advertising average CTR model, a logistic regression model, FFM or DNN, etc.
  • the aforementioned recommended module may be MF, FM, FFM, or the like.
  • the interpolation prediction label of the second type of training sample is obtained through the interpolation model, and the second type of training sample and the corresponding interpolation prediction label can be used as part of the training data to train the recommendation model. Incorporating the second type of training samples without sample labels into the modeling can make the sample distribution more reasonable and improve the accuracy of the recommended model.
  • the recommendation system constructs an input vector based on the user's attribute information and the information of the recommended object, and the recommendation module 502 predicts the probability that the user has an operation action on the recommended object.
  • FIG. 8 is a schematic diagram of a recommendation method 600 provided by an embodiment of the present application.
  • the method 600 includes step 610 and step 620. Steps 610 to 620 will be described in detail below.
  • the recommendation system when it receives a pending recommendation request, it can determine the attribute information of the target recommended user based on the pending recommendation request.
  • the attribute information of the target recommended user may include some attributes personalized by the user, such as the gender of the target recommended user, the age of the target recommended user, the occupation of the target recommended user, the income of the target recommended user, and the hobbies of the target recommended user , Educational situation of target recommended users, etc.
  • the information of the candidate recommended object may include the candidate recommended object identifier, for example, the candidate recommended object ID.
  • the information of the candidate recommended object may also include some attributes of the candidate recommended object, for example, the name of the candidate recommended object, the type of the candidate recommended object, and so on.
  • the candidate recommendation object may be a recommendation object in the candidate recommendation object set.
  • the candidate recommendation objects in the candidate recommendation set can be sorted according to the probability that the predicted target recommends that the user has an operation action on the candidate recommendation object, so as to obtain the recommendation result of the candidate recommendation object.
  • the candidate recommended object with the highest probability is selected and displayed to the user.
  • the candidate recommendation object may be a candidate recommendation application.
  • FIG. 9 shows a "recommendation" page in the application market.
  • the lists may include high-quality applications and high-quality games.
  • the recommendation system of the application market predicts the probability that the user will download (install) the candidate recommended application based on the user's attribute information and the candidate recommended application information, and ranks the candidate recommended applications in descending order with this probability , And rank the most likely downloaded applications in the top position.
  • the recommendation result in a boutique application may be that App5 is located in the recommended location in the boutique game.
  • App6 is located in the recommended location in the boutique game.
  • App7 is located in the recommended location in the boutique game.
  • App8 is located in the recommended location in the boutique game. four.
  • the application market shown in FIG. 9 can use user behavior logs as training data to train a recommendation model.
  • the recommendation model may be the recommendation module 501 in FIG. 7, and the training method of the recommendation model may adopt the training method shown in FIG. 5 or FIG. 6 and the training phase method of FIG. 7, which will not be repeated here.
  • the model parameters of the recommendation model are performed by using the attribute information of the first user of the first training sample and the information of the first recommendation object as the input of the recommendation model, and the interpolation prediction label of the first training sample as the target output value of the recommendation model. Get it through training.
  • the interpolation prediction label of the first training sample is obtained by processing the attribute information of the first user and the information of the first recommendation object through the interpolation model, and the interpolation prediction label is used to indicate when the first recommendation object is recommended to the first user , Whether the first user predicts the operation action of the first recommended object, the model parameters of the interpolation model are obtained by training based on at least one second training sample, and the second training sample includes the attribute information of the second user And the information of the second recommended object and the sample label of the second training sample.
  • the sample label of the second training sample is used to indicate whether the second user has an action on the second recommended object.
  • the second training sample is used as the second recommended object. It is obtained when it is randomly shown to the second user.
  • the model parameter of the recommendation model is to use the attribute information of the first user of the first training sample and the information of the first recommendation object as the input of the recommendation model, and use the interpolation prediction of the first training sample to predict the label. It is obtained by training as the target output value of the recommendation model, and the model parameters of the recommendation model are the third user of the attribute information of the first user, the information of the first recommendation object, and the third training sample.
  • the attribute information and the information of the third recommendation object are used as the input of the recommendation model, and the interpolation prediction label of the first training sample and the sample label of the third training sample are used as the target output value of the recommendation model.
  • the model is obtained, wherein the sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommended object.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user .
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate the difference between the interpolation prediction label of the first training sample and the prediction label of the first training sample
  • the second loss The function is used to indicate the difference between the sample label of the third training sample and the predicted label of the third training sample.
  • the target training model is:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • the interpolation model is selected according to the number of the second training samples.
  • the recommendation device described below can execute the training method of the recommendation model of the aforementioned embodiment of the application, and the recommendation device can execute the aforementioned embodiment of the application. In order to avoid unnecessary repetition, the repetitive description will be appropriately omitted when introducing the recommending device of the embodiment of the present application.
  • Fig. 10 is a schematic block diagram of a training device for a recommendation model according to an embodiment of the present application.
  • the training device 700 of the recommendation model shown in FIG. 10 includes an acquisition unit 710 and a processing unit 720.
  • the acquiring unit 710 and the processing unit 720 may be used to perform the training method of the recommendation model of the embodiment of the present application. Specifically, the acquiring unit 710 may perform the foregoing step 310 or step 410, and the processing unit 720 may perform the foregoing step 320 to step 330 or step 420 to step 440.
  • the obtaining unit 710 is configured to obtain at least one first training sample, where the first training sample includes the attribute information of the first user and the information of the first recommended object.
  • the processing unit 720 is configured to process the attribute information of the first user and the information of the first recommended object through the interpolation model, and obtain the interpolation prediction label of the first training sample.
  • the interpolation prediction label is used to indicate that the first user is recommended to the first user.
  • the sample label of the second training sample is used to indicate whether the second user has an operation on the second recommended object.
  • the second recommended object is obtained by randomly showing it to the second user.
  • the processing unit 720 is further configured to use the attribute information of the first user of the first training sample and the information of the first recommended object as the input of the recommendation model, and use the interpolation prediction label of the first training sample as the target output value of the recommendation model. Perform training to get the recommended model after training.
  • the obtaining unit 710 is further configured to obtain at least one third training sample, where the third training sample includes the attribute information of the third user, the information of the third recommended object, and the sample label of the third training sample,
  • the sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommended object.
  • the processing unit 720 is further configured to use the attribute information of the first user and the information of the first recommended object as well as the attribute information of the third user and the information of the third recommended object as the input of the recommendation model, and use the interpolation prediction label of the first training sample.
  • the sample label of the third training sample is used as the target output value of the recommendation model to train based on the target training model to obtain the trained recommendation model.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user Under the circumstances.
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate the interpolation prediction label of the first training sample and the first training sample
  • the second loss function is used to indicate the difference between the sample label of the third training sample and the predicted label of the third training sample.
  • the target training model is:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • the interpolation model is selected according to the number of the second training samples.
  • FIG. 11 is a schematic block diagram of a recommendation apparatus 800 provided by an embodiment of the present application.
  • the recommendation device 800 shown in FIG. 11 includes an obtaining unit 810 and a processing unit 820.
  • the acquiring unit 810 and the processing unit 820 may be used to execute the recommendation method of the embodiment of the present application. Specifically, the acquiring unit 810 may execute the foregoing step 610, and the processing unit 820 may execute the foregoing step 620.
  • the obtaining unit 810 is configured to obtain the attribute information of the target recommended user and the information of the candidate recommended object; the processing unit 820 is configured to input the attribute information of the target recommended user and the information of the candidate recommended object into the recommendation model to predict the target Probability that the recommended user has an operation action on the candidate recommended object.
  • the model parameters of the recommendation model are obtained by taking the attribute information of the first user of the first training sample and the information of the first recommendation object as the input of the recommendation model, and using the interpolation prediction label of the first training sample as the recommendation
  • the target output value of the model is obtained by training;
  • the interpolation prediction label of the first training sample is obtained by processing the attribute information of the first user and the information of the first recommendation object through the interpolation model, so
  • the interpolation prediction tag is used to indicate whether the first user predicts whether the first recommended object has an operation action on the first recommended object when the first recommended object is recommended to the first user
  • the model parameters of the interpolation model Is obtained by training based on at least one second training sample, the second training sample including the attribute information of the second user, the information of the second recommended object, and the sample label of the second training sample, the sample of the second training sample
  • the label is used to indicate whether the second user has an operation action on the second recommended object, and the second training sample is obtained when the second recommended object is randomly displayed to
  • the model parameters of the recommendation model are obtained by using the attribute information of the first user of the first training sample and the information of the first recommendation object as the input of the recommendation model, and using the first training sample
  • the interpolation prediction label is obtained by training as the target output value of the recommendation model, including: the model parameter of the recommendation model is the attribute information of the first user, the information of the first recommendation object, and the third training sample
  • the attribute information of the third user and the information of the third recommended object are used as the input of the recommendation model
  • the interpolation prediction label of the first training sample and the sample label of the third training sample are used as the target output of the recommendation model
  • the value is obtained based on the target training model, wherein the sample label of the third training sample is used to indicate whether the third user has an operation action on the third recommended object.
  • the first training sample may be obtained when the first recommended object is not shown to the first user, and the third training sample may be obtained when the third recommended object is shown to the third user Under the circumstances.
  • the target training model includes a first loss function and a second loss function
  • the first loss function is used to indicate that the sample label of the first type of training sample is different from the first type of training sample.
  • the difference between the predicted labels of the training samples, and the second loss function is used to indicate the difference between the interpolation predicted labels of the second type of training samples and the predicted labels of the second type of training samples.
  • the target training model is:
  • W is the parameter of the recommended model
  • R(W) is the regular term
  • is the hyperparameter that determines the weight of the regular term
  • the training sample set The training sample x 1 to training sample x L in is the third training sample, and the training sample x L+1 to training sample Is the first training sample, Represents the number of training samples in the training sample set, L represents the number of the third training samples in the training sample set, ⁇ l represents the interpolation prediction label ⁇ (x l ) of the training sample x l , and y l represents The sample label of the training sample x l, Represents the predicted label of the training sample x l, Represents the second loss function, Indicates the first loss function, and ⁇ is a hyperparameter used to adjust the proportion of the first loss function and the second loss function.
  • the interpolation model is selected according to the number of the second training samples.
  • training device 700 and device 800 are embodied in the form of functional units.
  • unit herein can be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit, or a combination of the two that realize the above-mentioned functions.
  • the hardware circuit may include an application specific integrated circuit (ASIC), an electronic circuit, and a processor for executing one or more software or firmware programs (such as a shared processor, a dedicated processor, or a group processor). Etc.) and memory, combined logic circuits and/or other suitable components that support the described functions.
  • the units of the examples described in the embodiments of the present application can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • FIG. 12 is a schematic diagram of the hardware structure of a training device for a recommendation model provided by an embodiment of the present application.
  • the training device 900 shown in FIG. 12 includes a memory 901, a processor 902, a communication interface 903, and a bus 904.
  • the memory 901, the processor 902, and the communication interface 903 implement communication connections between each other through the bus 1004.
  • the memory 901 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 901 may store a program.
  • the processor 902 is configured to execute each step of the recommended model training method of the embodiment of the present application, for example, execute the steps shown in FIG. 5 or FIG. 6 The various steps.
  • the training device shown in the embodiment of the present application may be a server, for example, it may be a server in the cloud, or may also be a chip configured in a server in the cloud.
  • the processor 902 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the recommended model training method in the method embodiment of the present application.
  • the processor 902 may also be an integrated circuit chip with signal processing capability.
  • each step of the training method of the recommendation model of the present application can be completed by an integrated logic circuit of hardware in the processor 902 or instructions in the form of software.
  • the aforementioned processor 902 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 901, and the processor 902 reads the information in the memory 901, and combines its hardware to complete the functions required by the units included in the training device shown in FIG. 9 in the implementation of this application, or execute the method implementation of this application Example of the training method of the recommendation model shown in Fig. 5 or Fig. 6.
  • the communication interface 903 uses a transceiver device such as but not limited to a transceiver to implement communication between the training device 900 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the training device 900 and other devices or communication networks.
  • the bus 904 may include a path for transferring information between various components of the training device 900 (for example, the memory 901, the processor 902, and the communication interface 903).
  • FIG. 13 is a schematic diagram of the hardware structure of a recommendation device provided by an embodiment of the present application.
  • the recommending apparatus 1000 shown in FIG. 13 includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 004. Among them, the memory 1001, the processor 1002, and the communication interface 1003 implement communication connections between each other through the bus 1004.
  • the memory 1001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1001 may store a program.
  • the processor 1002 is configured to execute each step of the recommendation method of the embodiment of the present application, for example, execute each step shown in FIG. 8.
  • the device shown in the embodiment of the present application may be a smart terminal, or may also be a chip configured in the smart terminal.
  • the processor 1002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the method for predicting the probability of selection in the method embodiment of the present application.
  • the processor 1002 may also be an integrated circuit chip with signal processing capability.
  • each step of the method for predicting the selection probability of the present application can be completed by the integrated logic circuit of hardware in the processor 1002 or instructions in the form of software.
  • the aforementioned processor 1002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and combines its hardware to complete the functions required by the units included in the device shown in FIG. 10 in the implementation of this application, or execute the method embodiments of this application The recommended method shown in Figure 8.
  • the communication interface 1003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1000 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1000 and other devices or a communication network.
  • the bus 1004 may include a path for transferring information between various components of the device 1000 (for example, the memory 1001, the processor 1002, and the communication interface 1003).
  • training device 900 and device 1000 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the training device 900 and device 1000 may also include realizing normal operation. Other necessary devices. At the same time, according to specific needs, those skilled in the art should understand that the above-mentioned training device 900 and device 1000 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the above-mentioned training device 900 and device 1000 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 12 or FIG. 13.
  • the memory may include a read-only memory and a random access memory, and provide instructions and data to the processor.
  • Part of the processor may also include non-volatile random access memory.
  • the processor can also store device type information.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: Universal Serial Bus flash disk (USB flash disk, UFD), UFD can also be referred to as U disk or USB flash drive, mobile hard disk, read-only memory (read-only memory, ROM), random access memory for short. (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes.
  • USB flash disk Universal Serial Bus flash disk
  • UFD can also be referred to as U disk or USB flash drive
  • mobile hard disk read-only memory (read-only memory, ROM), random access memory for short. (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes.
  • read-only memory read-only memory
  • RAM random access memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Medical Informatics (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种推荐模型的训练方法、推荐方法、装置及计算机可读介质,应用于人工智能(AI)领域中。该训练方法包括:获取至少一个第一训练样本;通过插补模型对第一用户的属性信息和第一推荐对象的信息进行处理,获取第一训练样本的插补预测标签,其中,插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的;以第一用户的属性信息和第一推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到训练后的推荐模型。该方法能够减轻训练数据偏置对推荐模型训练的影响,提高推荐模型的准确性。

Description

推荐模型的训练方法、推荐方法、装置及计算机可读介质 技术领域
本申请实施例涉及人工智能领域,尤其涉及一种推荐模型的训练方法、推荐方法、装置及计算机可读介质。
背景技术
商品的选择率预测是指预测用户在特定环境下对某个商品的选择概率。例如,应用商店、在线广告等应用的推荐系统中,选择率预测起到关键作用。在一次推荐中,能够被展示的商品的数量远小于总商品的数量,推荐系统通常基于预测的选择率从候选商品中选择商品进行展示。
上述选择机制导致用于训练推荐模型的训练数据是有偏置的,该偏置主要包括位置偏置和选择偏置。位置偏置是推荐和搜索场景中的普遍问题。位置偏置指的是由于商品展示的位置不同导致采集到的训练数据有偏置。例如,在应用市场的一个榜单中,同一个应用程序(application,APP)可以展示在第一位,也可以展示在最后一位。通过随机投放策略可以验证,该APP展示在第一位的下载率远高于展示在最后一位的下载率。选择偏置指的是由于商品被展示的概率不同导致采集到的训练数据有偏置。理想的训练数据是在将商品按照相同的展示概率展示给用户的情况下得到的。现实情况中,展示给用户的商品是根据之前的推荐模型预测的选择率决定的,商品得到展示的机会并不相同。
例如,在应用市场的一个榜单中,一个位置靠前的APP会增大用户下载的倾向,推荐模型计算得到的处于靠前位置的APP的选择率可能高于其他APP,导致该APP排在其他APP之前,加剧了偏置问题的影响,造成马太效应,导致长尾问题的加剧。
利用有偏置的训练数据对推荐模型进行训练,会降低训练模型的准确率,影响用户体验和收入。
发明内容
本申请提供一种推荐模型的训练方法、推荐方法、装置及计算机可读介质,以提高推荐模型的准确率。
第一方面,提供了一种推荐模型的训练方法,该训练方法包括:获取至少一个第一训练样本,第一训练样本包括第一用户的属性信息和第一推荐对象的信息;通过插补模型对第一用户的属性信息和第一推荐对象的信息进行处理,获取第一训练样本的插补预测标签,插补预测标签用于表示向第一用户推荐第一推荐对象时,第一用户是否对第一推荐对象有操作动作的预测;其中,插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,至少一个第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的;以 第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到训练后的推荐模型。
其中,第一推荐对象和第二推荐对象可以为终端设备的应用市场中的推荐应用程序;或者,第一推荐对象和第二推荐对象可以为搜索场景中系统推荐的搜索词。在本申请的实施例中,第一推荐对象和第二推荐对象可以是推荐系统为用户推荐的信息,对于第一推荐对象和第二推荐对象的具体实现方式本申请不作任何限定。
用户的属性信息包括用户个性化的一些属性,例如,用户的性别、用户的年龄、用户的职业、用户的收入、用户的爱好、用户的教育情况等。第一用户的属性信息可以包括上述用户的属性信息中的一项或多项。第二用户的属性信息可以包括上述用户的属性信息中的一项或多项。
推荐对象的信息包括推荐对象标识,例如推荐对象ID。推荐对象的信息还包括推荐对象的一些属性,例如,推荐对象的名称、推荐对象的类型等。第一推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。第二推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。
用户对推荐对象的操作动作可以包括用户的点击行为、用户的下载行为、用户的购买行为、用户的浏览行为和用户的差评行为等。
插补模型可以用于预测当向第一用户推荐第一推荐对象时,第一用户是否对第一推荐对象有操作动作。插补预测标签可以表示该预测的结果。
具体地,该插补预测标签可以为0或1,也就是用0或1表示第一用户是否对第一推荐对象有操作动作。该插补预测标签也可以为概率值,也就是用概率值表示第一用户对第一推荐对象有操作动作的概率。本申请对插补预测标签的形式不作任何限定。
可选地,插补模型可以为广告平均点击通过率模型、逻辑回归模型、域感知因子分解机或深度神经网络等。
可选地,推荐模型可以为矩阵分解模型、因子分解机或域感知因子分解机等。
根据本申请实施例的方案,第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的,该训练样本没有偏置,利用第二训练样本对插补模型进行训练,可以避免偏置问题对插补模型的训练带来的影响,提高插补模型的准确率,使得到的插补预测标签更加准确,进而利用更准确的插补预测标签对推荐模型进行训练,能够提高推荐模型的准确性。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:获取至少一个第三训练样本,第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及第三训练样本的样本标签,第三训练样本的样本标签用于表示第三用户是否对第三推荐对象有操作动作,以及以所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到训练后的推荐模型,包括:以第一用户的属性信息和第一推荐对象的信息以及第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签和第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型进行训练,得到训练后的推荐模型。
第三用户的属性信息可以包括上述用户的属性信息中的一项或多项。第三推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。
第三训练样本可以和第二训练样本相同,也可以和第二训练样本不同。
根据本申请实施例的方案,利用第一训练样本和第三训练样本一起对推荐模型进行训练,兼顾了插补模型得到的插补预测标签和实际的样本标签在训练过程中的作用,避免了推荐模型的准确性仅依赖于插补预测标签的准确率,进一步提高推荐模型的准确性。
结合第一方面,在第一方面的某些实现方式中,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,也就是说第一训练样本中不具备第一用户对第一推荐对象是否有操作动作的反馈信息,第一训练样本没有实际的样本标签。
根据本申请实施例的方案,在第一推荐对象没有被展示给第一用户的情况下,通过为第一训练样本增加插补预测标签,能够将没有发生过的事实纳入建模中,与发生过的事实一起用于推荐模型的训练,也就是将没有样本标签的第一训练样本与有样本标签的第三训练样本一起用于推荐模型的训练,可以使样本分布更加合理,提高推荐模型的准确性。
结合第一方面,在第一方面的某些实现方式中,目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
目标训练模型得到的模型参数即为训练后的推荐模型的模型参数。
结合第一方面,在第一方面的某些实现方式中,目标训练模型为:
Figure PCTCN2019114897-appb-000001
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000002
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000003
为所述第一训练样本,
Figure PCTCN2019114897-appb-000004
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000005
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000006
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000007
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
应理解,上述训练样本x 1至训练样本x L为L个不同的第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000008
Figure PCTCN2019114897-appb-000009
个不同的第一训练样本。
根据本申请实施例的方案,采用第二训练样本训练插补模型,也就是采用没有偏置的训练样本训练插补模型,同时在目标训练模型中引入第一损失函数和第二损失函数,通过设置不同的超参数能够调整第一损失函数和第二损失函数在目标训练模型中所占的比重,进一步提高推荐模型的准确率。例如,插补模型的模型参数是根据第二训练样本进行训练得到的,当第二训练样本的数量较多时,该第二训练样本相对具有代表性,使插补模型能够更准确地拟合无偏数据分布,得到的插补模型的准确率较高,在该情况下,第二损失函 数的权重可以高于第一损失函数的权重,也就是ω的值可以大于1。
结合第一方面,在第一方面的某些实现方式中,所述插补模型是根据所述第二训练样本的数量选择的。
示例性地,当第二训练样本的数量较多时,该第二训练样本相对具有代表性,可以采用较复杂的模型或是采用更多训练特征对插补模型进行训练,进而使插补模型能够更准确地拟合无偏数据分布。较复杂的模型可以为逻辑回归模型、域感知因子分解机或深度神经网络等。当第二训练样本的数量较少时,该第二训练样本相对不具代表性,可以采用较简略的模型或是采用更少训练特征对插补模型进行训练,避免插补模型过拟合无偏数据分布。例如,较简略的模型可以为广告平均点击通过率模型。
例如,在应用市场的应用场景下,当第二训练样本的数量为10万以上时,插补模型可以为域感知因子分解机或深度神经网络等;当第二训练样本的数量为1万至10万之间时,插补模型可以为逻辑回归模型;当第二训练样本的数量为1万以下时,插补模型可以为广告平均点击通过率模型。
根据本申请实施例的方案,在训练过程中,插补模型可以根据第二训练样本的数量进行选择,针对不同的应用场景可以设定不同的阈值来选择插补模型,插补模型可以灵活调整,只需少量的第二训练样本就能减轻偏置问题带来的影响,提升推荐模型的准确率,避免由于大规模采集第二训练样本而大规模随机展示推荐对象,导致系统整体收入下降。
第二方面,提供了一种推荐方法,包括:获取目标推荐用户的属性信息和候选推荐对象的信息;将所述目标推荐用户的属性信息和所述候选推荐对象的信息输入至推荐模型,预测所述目标推荐用户对所述候选推荐对象有操作动作的概率;其中,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的;所述第一训练样本的插补预测标签是通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理得到的,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测,所述插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的。
目标推荐用户的属性信息包括用户个性化的一些属性,例如,目标推荐用户的性别、目标推荐用户的年龄、目标推荐用户的职业、目标推荐用户的收入、目标推荐用户的爱好、目标推荐用户的教育情况等。
候选推荐对象的信息包括候选推荐对象标识,例如候选推荐对象ID。
候选推荐对象的信息还包括候选推荐对象的一些属性,例如,候选推荐对象的名称、候选推荐对象的类型等。
根据本申请实施例的方案,将目标推荐用户的属性信息和候选推荐对象的信息输入至推荐模型,预测目标推荐用户对候选推荐对象有操作动作的概率;推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到 的,用于得到插补预测标签的插补模型是根据没有偏置的训练样本进行训练得到的,可以避免偏置问题对插补模型的训练带来的影响,提高插补模型的准确率,使得到的插补预测标签更加准确,进而利用更准确的插补预测标签对推荐模型进行训练,利用训练好的推荐模型预测目标推荐用户对目标推荐用户有操作动作的概率的准确率更高。
结合第二方面,在第二方面的某些实现方式中,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的,包括所述推荐模型的模型参数是以所述第一用户的属性信息和所述第一推荐对象的信息以及第三训练样本的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型得到的,其中,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作。
根据本申请实施例的方案,利用第一训练样本和第三训练样本一起对推荐模型进行训练,兼顾了插补模型得到的插补预测标签和实际的样本标签在训练过程中的作用,避免了推荐模型的准确性仅依赖于插补预测标签的准确率,进一步提高推荐模型的准确性,利用训练好的推荐模型预测目标推荐用户对目标推荐用户有操作动作的概率的准确率更高。
结合第二方面,在第二方面的某些实现方式中,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,也就是说第一训练样本中不具备第一用户对第一推荐对象是否有操作动作的反馈信息,第一训练样本没有实际的样本标签。
根据本申请实施例的方案,在第一推荐对象没有被展示给第一用户的情况下,通过为第一训练样本增加插补预测标签,能够将没有发生过的事实纳入建模中,与发生过的事实一起用于推荐模型的训练,也就是将没有样本标签的第一训练样本与有样本标签的第三训练样本一起用于推荐模型的训练,可以使样本分布更加合理,提高推荐模型的准确性,利用训练好的推荐模型预测目标推荐用户对目标推荐用户有操作动作的概率的准确率更高。
结合第二方面,在第二方面的某些实现方式中,目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
目标训练模型得到的模型参数即为训练后的推荐模型的模型参数。
结合第二方面,在第二方面的某些实现方式中,目标训练模型为:
Figure PCTCN2019114897-appb-000010
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000011
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000012
为所述第一训练样本,
Figure PCTCN2019114897-appb-000013
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插 补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000014
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000015
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000016
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
应理解,上述训练样本x 1至训练样本x L为L个不同的第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000017
Figure PCTCN2019114897-appb-000018
个不同的第一训练样本。
根据本申请实施例的方案,采用第二训练样本训练插补模型,也就是采用没有偏置的训练样本训练插补模型,同时在目标训练模型中引入第一损失函数和第二损失函数,通过设置不同的超参数能够调整第一损失函数和第二损失函数在目标训练模型中所占的比重,进一步提高推荐模型的准确率。例如,插补模型的模型参数是根据第二训练样本进行训练得到的,当第二训练样本的数量较多时,该第二训练样本相对具有代表性,使插补模型能够更准确地拟合无偏数据分布,得到的插补模型的准确率较高,在该情况下,第二损失函数的权重可以高于第一损失函数的权重,也就是ω的值可以大于1。只需少量的第二训练样本就能减轻偏置问题带来的影响,提升推荐模型的准确率,避免由于大规模采集第二训练样本而大规模随机展示推荐对象,导致系统整体收入下降。利用该推荐模型预测目标推荐用户对候选推荐对象有操作动作的概率的准确率更高。
结合第二方面,在第二方面的某些实现方式中,插补模型是根据所述第二训练样本的数量选择的。
示例性地,当第二训练样本的数量较多时,该第二训练样本相对具有代表性,可以采用较复杂的模型或是采用更多训练特征对插补模型进行训练,进而使插补模型能够更准确地拟合无偏数据分布。较复杂的模型可以为逻辑回归模型、域感知因子分解机或深度神经网络等。当第二训练样本的数量较少时,该第二训练样本相对不具代表性,可以采用较简略的模型或是采用更少训练特征对插补模型进行训练,避免插补模型过拟合无偏数据分布。例如,较简略的模型可以为广告平均点击通过率模型。
例如,在应用市场的应用场景下,当第二训练样本的数量为10万以上时,插补模型可以为域感知因子分解机或深度神经网络等;当第二训练样本的数量为1万至10万之间时,插补模型可以为逻辑回归模型;当第二训练样本的数量为1万以下时,插补模型可以为广告平均点击通过率模型。
根据本申请实施例的方案,在训练过程中,插补模型可以根据第二训练样本的数量进行选择,针对不同的应用场景可以设定不同的阈值来选择插补模型,插补模型可以灵活调整,只需少量的第二训练样本就能减轻偏置问题带来的影响,提升推荐模型的准确率,避免由于大规模采集第二训练样本而大规模随机展示推荐对象,导致系统整体收入下降。
第三方面,提供了一种推荐模型的训练装置,该装置包括用于执行第一方面以及第一方面中任意一种实现方式中的方法的各个模块/单元。
第四方面,提供了一种推荐装置,该装置包括用于执行第二方面以及第二方面中任意一种实现方式中的方法的各个模块/单元。
第五方面,提供一种推荐模型的训练装置,包括输入输出接口、处理器和存储器。该处理器用于控制输入输出接口收发信息,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该训练装置执行上述第一方面以及第一方面中的任意一种实现方式中的方法。
可选地,上述训练装置可以是终端设备/服务器,也可以是终端设备/服务器内的芯片。
可选地,上述存储器可以位于处理器内部,例如,可以是处理器中的高速缓冲存储器(cache)。上述存储器还可以位于处理器外部,从而独立于处理器,例如,训练装置的内部存储器(memory)。
第六方面,提供一种推荐装置,包括输入输出接口、处理器和存储器。该处理器用于控制输入输出接口收发信息,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得装置执行上述第二方面以及第二方面中的任意一种实现方式中的方法。
可选地,上述装置可以是终端设备/服务器,也可以是终端设备/服务器内的芯片。
可选地,上述存储器可以位于处理器内部,例如,可以是处理器中的高速缓冲存储器(cache)。上述存储器还可以位于处理器外部,从而独立于处理器,例如,装置的内部存储器(memory)。
第七方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述各方面中的方法。
需要说明的是,上述计算机程序代码可以全部或者部分存储在第一存储介质上,其中,第一存储介质可以与处理器封装在一起的,也可以与处理器单独封装,本申请实施例对此不作具体限定。
第八方面,提供了一种计算机可读介质,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述各方面中的方法。
附图说明
图1是本申请实施例提供的应用场景的示意图。
图2是本申请实施例提供的一种推荐系统的架构图。
图3是本申请实施例提供的一种系统架构的结构示意图。
图4是本申请实施例提供的一种系统架构的示意图。
图5是本申请一个实施例提供的推荐模型的训练方法的示意性流程图。
图6是本申请另一个实施例提供的推荐模型的训练方法的示意性流程图。
图7是本申请实施例提供的推荐框架的示意图。
图8是本申请实施例提供的推荐方法的示意性流程图。
图9是本申请实施例提供的应用市场中推荐对象的示意图。
图10是本申请实施例提供的推荐模型的训练装置的示意性框图。
图11是本申请实施例提供的推荐装置的示意性框图。
图12是本申请实施例提供的推荐模型的训练装置的示意性框图。
图13是本申请实施例提供的推荐装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1示出了本申请实施例的部分应用场景。本申请实施例提供的推荐方法能够应用在所有需要推荐的场景中。例如,如图1所示,本申请实施例提供的推荐方法能够应用在应 用市场推荐、音乐应用程序推荐、视频网站推荐、电商推荐、搜索引擎排序等需要进行推荐的场景。下面分别对两种常用的应用场景进行简单的介绍。
应用场景一:应用市场推荐
在应用市场中可以展示部分应用程序。推荐系统可以用于决定被展示的应用程序以及该应用程序相应的展示位置。例如,在点击付费(cost per click,CPC)的系统中只有当应用程序被用户点击时,广告商才需要付费。当用户进入应用市场,会触发一个推荐请求(request)。由于用于应用程序展示的位置有限,当推荐系统收到一个推荐请求时,可以对所有待展示的应用程序都按照期望收入进行排序,然后选择最有价值一个或多个应用程序展示在相应的展示位置。在CPC系统中,每个应用程序的期望收入与该应用程序的预估点击通过率(click-through rate,CTR)有关。在该情况下,CTR可以理解为每个APP被点击的概率。为了得到期望收入的排序,需要得到预估CTR。
具体地,得到所有待展示的应用程序的预估CTR,根据每个应用程序的预估CTR计算每个应用程序的期望收入并进行排序,根据排序结果确定被展示的应用程序以及该应用程序相应的展示位置。
其中,得到所有待展示应用程序的预估CTR可以由本申请实施例中的推荐方法来执行,根据得到的预估CTR能够对所有待展示的应用程序进行排序,进而可以根据该排序结果确定被展示的应用程序以及相应的展示位置。
应用场景二:搜索词推荐
在用户进行搜索时,搜索词通常包括两个来源:用户主动输入的搜索词和系统推荐给用户的搜索词。用户主动输入搜索词的行为是系统无法干预的用户行为。系统推荐给用户的搜索词指的是,当推荐系统收到推荐请求时,可以计算所有待展示的搜索词的分数,并对该分数进行排序,例如,搜索词的分数可以为搜索词的点击率,根据排序结果可以确定被展示的搜索词以及该搜索词相应的展示位置。
其中,计算所有搜索词的分数可以由本申请实施例中的推荐方法来执行,根据得到的分数能够对所有待展示的搜索词进行排序,进而可以根据该排序结果确定被展示的搜索词以及相应的展示位置。
为了便于理解本申请实施例,下面先对本申请实施例涉及的相关术语的相关概念进行介绍。
(1)推荐系统
推荐系统是指根据用户的历史数据,利用机器学习算法进行分析,根据分析结果对新的推荐请求进行预测,得到推荐结果的系统。
例如,图2示出了本申请实施例中提供的一种推荐系统的架构图。当用户进入系统,会触发一个推荐请求,推荐系统将该推荐请求以及相关信息输入推荐模型中,预测用户对系统内的商品的选择率。进一步,根据预测的选择率或基于该选择率的某个函数对商品进行排序。推荐系统可以根据排序结果将要展示给用户的商品以及商品展示的位置作为对用户的推荐结果。用户浏览被展示的商品并可能发生操作动作,例如浏览行为、下载行为等。用户的操作动作可以存入用户行为日志,对用户行为日志进行预处理可以得到训练数据。利用该训练数据可以不断更新推荐模型的参数,以提高推荐模型的预测效果。
例如,用户打开智能终端(例如,手机)中的应用市场可以触发应用市场中的推荐系 统,也就是触发一条推荐请求。推荐系统可以根据用户的历史行为日志,例如,用户的历史下载记录,以及应用市场的自身特征,比如时间、地点等环境特征信息,预测用户下载推荐的各个候选应用程序的概率。推荐系统可以按照预测的概率大小降序展示候选应用程序,提高候选应用程序的下载概率。
例如,当应用市场的展示位置为p个时,p为正整数,推荐系统可以选择预测的概率最高的p个候选应用程序进行展示,并将p个候选应用程序中预测的概率较高的应用程序展示在靠前的位置,将p个候选应用程序中预测的用户选择率较低的应用程序展示在靠后的位置。
曝光数据是指记录的用户浏览行为数据。
单类模型是指训练样本中只有一类数据是明确的模型。
上下文信息可以指推荐请求中的用户和/或推荐对象的背景信息,如城市、职业、价格、类别等。
上述推荐模型可以是神经网络模型,下面对本申请实施例中可能涉及的神经网络的相关术语和概念进行介绍。
(2)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以如公式(1)所示:
Figure PCTCN2019114897-appb-000019
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),该激活函数用于对神经网络中的特征进行非线性变换,将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(3)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2019114897-appb-000020
其中,
Figure PCTCN2019114897-appb-000021
是输入向量,
Figure PCTCN2019114897-appb-000022
是输出向量,
Figure PCTCN2019114897-appb-000023
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2019114897-appb-000024
经过如此简单的操作得到输出向量
Figure PCTCN2019114897-appb-000025
由于DNN层数多,系数W和偏移向量
Figure PCTCN2019114897-appb-000026
的数量也比较多。这些参数在DNN中的定义如下所述:以系数W为例,假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2019114897-appb-000027
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2019114897-appb-000028
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的数值,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
下面结合图3对本申请实施例的系统架构进行详细的介绍。
图3是本申请实施例的系统架构的示意图。如图3所示,系统架构100包括执行设备110、训练设备120、数据库130、客户设备140、数据存储系统150、以及数据采集系统160。
另外,执行设备110包括计算模块111、I/O接口112、预处理模块113和预处理模块114。其中,计算模块111中可以包括目标模型/规则101,预处理模块113和预处理模块114是可选的。
数据采集设备160用于采集训练数据。针对本申请实施例的推荐模型的训练方法来说,可以通过训练数据对推荐模型进行进一步训练。
例如,在本申请实施例中,训练数据可以包括训练样本以及训练样本的样本标签。训练样本可以包括用户的属性信息和推荐对象的信息。样本标签表示用户对推荐对象是否有操作动作。用户对推荐对象是否有操作动作可以理解为训练样本中的用户是否选择推荐对象。
在采集到训练数据之后,数据采集设备160将训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120 对输入的用户的属性信息和推荐对象的信息进行处理,将输出的预测标签与样本标签进行对比,直到训练设备120输出的预测标签与样本标签的差异小于一定的阈值,从而得到训练好的推荐模型,即训练后的推荐模型可以是目标模型/规则101的训练。
上述目标模型/规则101能够用于预测用户是否选择推荐对象或预测用户选择推荐对象的概率。本申请实施例中的目标模型/规则101具体可以为神经网络、逻辑回归模型等。
需要说明的是,在实际应用中,数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图3所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图3中,执行设备110配置有输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的训练数据。这里的客户设备140具体可以是终端设备。
预处理模块113和预处理模块114用于根据I/O接口112接收到的输入数据进行预处理,在本申请实施例中,可以没有预处理模块113和预处理模块114或者只有的一个预处理模块。当不存在预处理模块113和预处理模块114时,可以直接采用计算模块111对输入数据进行处理。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果提供给用户,如,目标模型/规则101可以用于推荐系统中预测目标推荐用户是否候选推荐对象或选择候选推荐对象的概率,根据目标推荐用户是否候选推荐对象或选择候选推荐对象的概率得到推荐结果,呈现给客户设备140,从而提供给用户。
例如,在本申请实施例中,上述推荐结果可以为根据目标推荐用户选择候选推荐对象的概率得到的候选推荐对象的推荐排序,或者,上述推荐结果可以为根据目标推荐用户选择候选推荐对象的概率得到的目标推荐对象,目标推荐对象可以为概率最高的一个或多个候选推荐对象。
应理解,当上述系统架构100中不存在预处理模块113和预处理模块114时,计算模块111还可以将处理得到的排序较高的商品传输到I/O接口,然后再由I/O接口将排序较高的商品送入到客户设备140中显示。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练样本生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图3中所示情况下,在一种情况下,用户可以手动给定输入数据,该手动给定可以 通过I/O接口112提供的界面进行操作。
另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
示例性地,本申请实施例中的推荐模型还可以是逻辑回归(logistic regression)模型,逻辑回归模型是一种用于解决分类问题的机器学习方法,可以用于估计某种事物的可能性。
例如,推荐模型可以是深度因子分解机(deep factorization machines,DeepFM)模型,或者,推荐模型可以是深宽(Wide and Deep)模型。
下面介绍本申请实施例提供的一种芯片硬件结构。
图4示出了本申请实施例提供了一种应用本申请实施例的推荐模型的训练方法以及推荐方法的系统架构200。该系统架构200可以包括本地设备220、本地设备230以及执行设备210和数据存储系统250,其中,本地设备220和本地设备230通过通信网络与执行设备210连接。
执行设备210由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备;执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码实现本申请实施例的推荐模型的训练方法以及推荐方法。
示例性地,数据存储系统250可以部署于本地设备220或者本地设备230中,例如,数据存储系统250可以用于存储训练样本。
需要说明的是,上述执行设备210也可以称为云端设备,此时执行设备210可以部署在云端。
具体地,执行设备210可以执行以下过程:获取至少第一训练样本,所述第一训练样本包括第一用户的属性信息和第一推荐对象的信息;通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理,获取所述第一训练样本的插补预测标签,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测;其中,所述插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机 展示给所述第二用户的情况下获得的;以所述第一训练样本的所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到推荐模型。
执行设备210能够通过执行上述过程训练得到推荐模型,通过该推荐模型可以消除训练数据偏置对推荐准确率的影响,更准确地预测目标推荐用户对候选推荐对象有操作动作的概率。
在一种可能的实现方式中,上述执行设备210执行的训练方法可以是在云端执行的训练方法。
用户可以操作各自的用户设备(例如本地设备220和本地设备230)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在一种实现方式中,本地设备220、本地设备230可以从执行设备210获取到推荐模型的相关参数,推荐模型在本地设备220、本地设备230上,利用该推荐模型预测目标推荐用户对候选推荐对象有操作动作的概率。
在另一种实现中,执行设备210上可以直接部署推荐模型,执行设备310通过从本地设备220和本地设备230获取待处理数据,并根据推荐模型得到预测目标推荐用户对候选推荐对象有操作动作的概率。
示例性地,数据存储系统250可以是部署在本地设备220或者本地设备230中,用于存储本地设备的训练样本。
示例性地,数据存储系统250可以独立于本地设备220或本地设备230,单独部署在存储设备上,存储设备可以与本地设备进行交互,获取本地设备中用户行为日志,并存入存储设备中。
图5示出了本申请一个实施例的推荐模型的训练方法300,方法300包括步骤310至步骤330。该训练方法300可以由图3中的训练设备120执行。下面对步骤310至步骤330进行详细介绍。
310,获取至少一个第一训练样本,第一训练样本包括第一用户的属性信息和第一推荐对象的信息。
其中,第一训练样本可以是如图4所示的数据存储系统250中获取的数据。
示例性地,第一用户的属性信息和第一推荐对象的信息可以通过第一训练样本的上下文信息获得。
用户的属性信息可以包括用户个性化的一些属性,例如,用户的性别、用户的年龄、用户的职业、用户的收入、用户的爱好、用户的教育情况等。
第一用户的属性信息可以包括上述用户的属性信息中的一项或多项。
推荐对象可以为前述应用场景一中的终端设备的应用市场中的推荐应用程序;或者,推荐对象可以为前述应用场景二中的系统推荐的搜索词。在本申请的实施例中,推荐对象可以是推荐系统能够为用户推荐的信息,对于推荐对象的具体实现方式本申请不作任何限 定。
第一推荐对象可以为上述推荐对象中的一种。
推荐对象的信息可以包括推荐对象标识,例如推荐对象ID。推荐对象的信息还可以包括推荐对象的一些属性,例如,推荐对象的名称、推荐对象的类型等。
示例性地,推荐对象可以为前述应用场景一中的终端设备的应用市场中的推荐应用程序,推荐对象的信息可以为推荐应用程序的信息。推荐应用程序的信息可以包括推荐应用程序的标识,例如,推荐应用程序的id。推荐应用程序的信息的还可以包括推荐应用程序的一些属性,例如,推荐应用程序的名称、推荐应用程序的开发者、推荐应用程序的类型、推荐应用程序的安装包大小、推荐应用程序的打分、推荐应用程序的评论等。
第一推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。
311,获取至少一个第三训练样本,第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及所述第三训练样本的样本标签,第三训练样本的样本标签用于表示第三用户是否对第三推荐对象有操作动作。
需要说明的是步骤311为可选步骤。
其中,第三训练样本可以是如图4所示的数据存储系统250中获取的数据。
示例性地,第三用户的属性信息和第三推荐对象的信息可以通过第三训练样本的上下文信息获得。
第一用户和第三用户可以为相同的用户,也可以为不同的用户。
第三用户的属性信息可以包括步骤310中所述的用户的属性信息中的一项或多项。
第三推荐对象可以为步骤310中所述的推荐对象中的一种。
第三推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。
需要说明的是,第一推荐对象的信息和第三推荐对象的信息中所包含的推荐对象的属性类别可以相同,也可以不同。例如,第一推荐对象的信息可以包括第一推荐对象的名称和第一推荐对象的类型。第三推荐对象的信息可以包括第三推荐对象的名称。
标签可以用于标记训练样本为正样本还是负样本。例如,标签可以为0或1,正样本的标签可以为1,负样本的标签可以为0。再例如,标签也可以为具体数值,也就是通过具体数值标记训练样本为正样本或负样本的概率。
样本标签可以基于用户是否对推荐对象有操作动作获得。
用户对推荐对象有操作动作可以包括用户的点击行为、用户的下载行为、用户的购买行为、用户的浏览行为和用户的差评行为等。
样本标签基于用户是否对推荐对象有操作动作获得,具体可以包括以下几种情况。
情况1:用户对推荐对象有操作动作,则样本标签可以为1,用户对推荐对象没有操作动作,样本标签可以为0。
示例性地,在应用市场中,该操作动作可以为下载行为。具体地,当训练样本A1中的用户对训练样本A1中的推荐对象有下载行为,则训练样本A1为正样本,训练样本A1的样本标签可以为1;当训练样本A1中的用户对训练样本A1中的推荐对象没有下载行为,则训练样本A1为负样本,训练样本A1的样本标签可以为0。其中,训练样本A1为第三训练样本的一例。
情况2:用户对推荐对象有操作动作,则样本标签可以为0,用户对推荐对象没有操 作动作,样本标签可以为1。
示例性地,在应用市场中,该操作动作可以为差评行为。具体地,当训练样本A1中的用户对训练样本A1中的推荐对象有差评行为,则训练样本A1为负样本,训练样本A1的样本标签可以为0;当训练样本A1中的用户对训练样本A1中的推荐对象没有差评行为,则训练样本A1为正样本,训练样本A1的样本标签可以为1。其中,训练样本A1为第三训练样本的一例。
情况3:用户对推荐对象有第一类操作动作,则样本标签可以为1,用户对推荐对象有第二类操作动作,样本标签可以为0。
示例性地,在付费音乐推荐的应用场景中,该第一类操作动作可以包括购买行为等,该第二类操作动作可以包括浏览行为等。具体地,当训练样本A1中的用户对训练样本A1中的推荐对象有浏览行为,则训练样本A1为负样本,训练样本A1的样本标签可以为0;当训练样本A1中的用户对训练样本A1中的推荐对象有购买行为,则训练样本A1为正样本,训练样本A1的样本标签可以为1。其中,训练样本A1为第三训练样本的一例。应理解,本申请实施例中仅以购买行为和浏览行为为例对确定样本标签的过程进行说明,不应视为对本申请实施例的限制。在实际应用中,可以根据具体的应用场景确定操作动作对应的样本标签。例如,在一些场景中,该第一类操作动作可以包括浏览行为等,该第二类操作动作可以包括差评行为等。
可选地,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
推荐对象被展示给用户后,可以得到用户对推荐对象是否有操作动作的反馈信息。例如,可以从用户行为日志中可以得到该反馈信息。
第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,也就是说第一训练样本中不具备第一用户对第一推荐对象是否有操作动作的反馈信息,第一训练样本没有实际的样本标签。
下面给出一个第三训练样本的示例。从一个推荐请求和该推荐请求对应的推荐对象中可以得到训练样本。该训练样本包括该推荐请求中的用户的属性信息和该推荐对象的信息。
例如,在应用市场的应用场景中,推荐系统向用户推荐应用程序。训练样本A1(训练样本A1为第三训练样本的一例)可以包括训练样本A1中的用户的性别、用户的职业、训练样本A1中的推荐应用程序id和推荐应用程序的类型这4类属性数据,也就是4个域(field)的数据,还可以理解为该训练样本A1中包括4个训练特征。域表示属性的类别。比如成都、重庆、北京都属于同一个field,该field也就是城市。4类属性分别编号为0~3。
以训练样本A1为正样本为例,该训练样本A1可以表示为x l=[1,0:男,1:教师,2:微信,3:社交],最前面的1表示该训练样本A1的样本标签为1。样本标签可以表示用户对该推荐应用程序是否有下载行为,样本标签为1可以表示该用户下载了微信。这是一条原始的训练样本,在预处理过程中还可以对训练特征进行数字编号,例如,将“男”编号为0,“教师”编号为1,“微信”编号为2,“社交”编号为3,则原始训练样本可转化为x l=[1,0:0,1:1,2:2,3:3]。
再例如,在音乐推荐的应用场景中,推荐系统向用户推荐音乐。推荐音乐可以为需要 付费的音乐。训练样本A2(训练样本A2为第三训练样本的一例)可以包括训练样本A2中的用户的性别、用户的年龄、训练样本A2中的推荐音乐id、推荐音乐的类型和推荐音乐的评分这5类属性数据,也就是5个域。5类属性分别编号为0~4。
以训练样本A2为正样本为例,该训练样本A2可以表示为x l=[1,0:男,1:20岁,2:音乐1,3:摇滚,4:4分],最前面的1表示该训练样本A2的样本标签为1,样本标签可以表示用户对该推荐音乐是否有购买行为,样本标签为1可以表示该用户购买了音乐1。这是一条原始的训练样本,在预处理过程中还可以对训练特征进行数字编号,例如,将“男”编号为0,“20岁”编号为1,“音乐1”编号为2,“摇滚”编号为3,“4分”编号为4,则原始训练样本可转化为x l=[1,0:0,1:1,2:2,3:3,4:4]。
下面以推荐对象为应用市场中的推荐应用程序为例对第一训练样本和第三训练样本进行说明。
第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。在该情况下,第一推荐对象可以为没有被投放(或者说是没有被展示给第一用户)的推荐应用程序。第三推荐对象可以为已经被投放(或者说是已经被展示给第三用户)的推荐应用程序。例如,针对一个推荐请求,候选推荐应用程序中的一个或多个推荐应用程序被展示给该推荐请求对应的用户,进而可以得到用户对推荐应用程序是否有操作动作的反馈信息。被展示给该推荐请求对应的用户A的推荐应用程序A即为第三推荐对象,具备用户A对推荐应用程序A是否有操作动作的反馈信息的训练样本即为第三训练样本。也就是说,该第三训练样本包括用户A的属性信息和推荐应用程序A的信息以及第三训练样本的样本标签。候选推荐应用程序中没有被展示给用户A的推荐应用程序B无法得到用户A对推荐应用程序B是否有操作动作的反馈信息。没有被展示给该推荐请求对应的用户A的推荐应用程序B即为第一推荐对象,不具备用户A对推荐应用程序B是否有操作动作的反馈信息的训练样本即为第一训练样本。该第一训练样本包括用户A的属性信息和推荐应用程序B的信息。应理解,以上仅以第一训练样本和第三训练样本对应相同的推荐请求为例进行说明,第一用户和第三用户均为用户A仅为示例。对于一个推荐请求,候选推荐应用程序的数量为m个,被展示给推荐请求对应的用户的推荐应用程序的数量为n个,没有被展示给推荐请求对应的用户的推荐应用程序的数量为m-n个。相应地,该n个推荐应用程序可以对应n条第三训练样本,也就是由该推荐请求和该n个推荐应用程序可以构建n条第三训练样本。该m-n个推荐应用程序可以对应m-n条第一训练样本,也就是由该推荐请求和该m-n个推荐应用程序可以构建m-n条第一训练样本。
320,通过插补模型对第一用户的属性信息和第一推荐对象的信息进行处理,获取第一训练样本的插补预测标签,插补预测标签用于表示向第一用户推荐第一推荐对象时,第一用户是否对第一推荐对象有操作动作的预测。
其中,插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的。
插补模型可以用于预测当向第一用户推荐第一推荐对象时,第一用户是否对第一推荐 对象有操作动作。该插补预测标签可以为0或1,也就是用0或1表示第一用户是否对第一推荐对象有操作动作。该插补预测标签也可以为概率值,也就是用概率值表示第一用户对第一推荐对象有操作动作的概率。
例如,插补模型可以为广告平均CTR模型、逻辑回归(logistic regression,LR)模型、域感知因子分解机(field-aware factorization machines,FFM)或DNN等。
其中,第二训练样本可以是如图4所示的数据存储系统250中获取的数据。
示例性地,第二用户的属性信息和第二推荐对象的信息可以通过第二训练样本的上下文信息获得。
第一用户和第二用户可以为相同的用户,也可以为不同的用户。
第二用户的属性信息可以包括步骤310中所述的用户的属性信息中的一项或多项。
第二推荐对象可以为步骤310中所述的推荐对象中的一种。
第二推荐对象的信息可以包括上述推荐对象的信息中的一项或多项。
需要说明的是,第一推荐对象的信息和第二推荐对象的信息中所包含的推荐对象的属性类别可以相同,也可以不同。例如,第一推荐对象的信息可以包括第一推荐对象的名称和第一推荐对象的类型。第二推荐对象的信息可以包括第二推荐对象的名称。
关于第二样本的样本标签的描述可以如步骤311中所述,此处不再赘述。
如上所述,第二训练样本可以为当第二推荐对象被展示给第二用户的情况下获得的,也就是说第二训练样本中具备第二用户对第二推荐对象是否有操作动作的反馈信息,第二训练样本有实际的样本标签。
该第二训练样本可以与上述步骤311中的第三训练样本相同,也可以与第三训练样本不同。
第二训练样本为没有偏置的训练样本。该第三训练样本可以为没有偏置的训练样本,也可以为有偏置的训练样本。
有偏置的训练样本可以理解为在当推荐对象为按照一定规则展示给用户的情况下获得的。例如,在收到推荐请求时,将候选的推荐对象按照期望收入进行排序,按照排序确定被展示给用户的推荐对象,也就是说在该情况下,各个推荐对象被展示给用户的概率是不同的,期望收入较高的推荐对象被展示给用户的概率较高,在该情况下得到的训练样本即为有偏置的训练样本。
下面以推荐对象为应用市场中的推荐应用程序为例对没有偏置的训练样本和有偏置的训练样本进行说明。
针对一条推荐请求,通过随机投放策略展示推荐应用程序,也就是将多个候选推荐应用程序中的推荐应用程序随机展示给该推荐请求对应的用户,各个推荐应用程序被展示给该推荐请求对应的用户的概率相同,在该情况下得到的训练样本即为没有偏置的训练样本。针对一条推荐请求,将候选推荐应用程序中的推荐程序按照期望收入进行排序,按照排序决定被展示给用户的推荐应用程序,在该情况下得到的训练样本为有偏置的训练样本。
采用第二训练样本训练插补模型,也就是采用没有偏置的训练样本训练插补模型可以避免偏置问题对插补模型的训练带来的影响,提高插补模型的准确率,使得到的插补预测标签更加准确。
可选地,插补模型可以根据第二训练样本的数量选择。
示例性地,当第二训练样本的数量较多时,该第二训练样本相对具有代表性,可以采用较复杂的模型或是采用更多训练特征对插补模型进行训练,进而使插补模型能够更准确地拟合无偏数据分布。较复杂的模型可以为逻辑回归模型、域感知因子分解机或深度神经网络等。当第二训练样本的数量较少时,该第二训练样本相对不具代表性,可以采用较简略的模型或是采用更少训练特征对插补模型进行训练,避免插补模型过拟合无偏数据分布。例如,较简略的模型可以为广告平均点击通过率模型。
例如,在应用市场的应用场景下,当第二训练样本的数量为10万以上时,插补模型可以为域感知因子分解机或深度神经网络等;当第二训练样本的数量为1万至10万之间时,插补模型可以为逻辑回归模型;当第二训练样本的数量为一万以下时,插补模型可以为广告平均点击通过率模型。
根据本申请实施例的方案,在训练过程中,插补模型可以根据第二训练样本的数量进行选择,针对不同的应用场景可以设定不同的阈值来选择插补模型,插补模型可以灵活调整,只需少量的第二训练样本就能减轻偏置问题带来的影响,提升推荐模型的准确率,避免由于大规模采集第二训练样本而大规模随机展示推荐对象,导致系统整体收入下降。
训练特征可以为从用户的属性信息和推荐对象的信息中得到的特征。
330,以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到推荐模型。
在方法300包括步骤311的情况下,步骤330可以为根据所述第一训练样本和所述第三训练样本进行训练,得到推荐模型。
具体地,根据所述第一训练样本和所述第三训练样本进行训练可以为,以第一用户的属性信息和第一推荐对象的信息以及第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签和第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型进行训练,得到训练后的推荐模型。
应理解,上述训练过程是以第一用户的属性信息和第一推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签作为该输入对应的目标输出值,且以第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以第三训练样本的样本标签作为该输入对应的目标输出值,基于目标训练模型进行训练。
可选地,目标训练模型包括第一损失函数和第二损失函数,第一损失函数用于指示第一训练样本的插补预测标签与第一训练样本的预测标签之间的差异,第二损失函数用于指示第三训练样本的样本标签与第三训练样本的预测标签之间的差异。
示例性地,基于目标训练模型对推荐模型进行训练可以为通过反向传播算法多次迭代,不断减小第一损失函数和第二损失函数,得到推荐模型的模型参数。
具体地,第一损失函数和第二损失函数之间可以为相加的关系。第一损失函数与第二损失函数之间也可以为相乘的关系。
可选地,目标训练模型可以为:
Figure PCTCN2019114897-appb-000029
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000030
中的训练样本x 1至训练样本x L为第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000031
为第一训练样本,
Figure PCTCN2019114897-appb-000032
表示训练样本集中的训练样本的数量,L表示所述训练样本集中的第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000033
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000034
表示第二损失函数,
Figure PCTCN2019114897-appb-000035
所述第一损失函数,ω为超参数,用于所述第一损失函数和第二损失函数的比重。
应理解,上述训练样本x 1至训练样本x L为L个不同的第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000036
Figure PCTCN2019114897-appb-000037
个不同的第一训练样本。
示例性地,步骤320还可以通过插补模型对第三训练样本进行处理,得到第三训练样本的插补预测标签,然后以第三训练样本和第一训练样本作为推荐模型的输入样本,以第三训练样本的样本标签、第三训练样本的插补预测标签和第一训练样本的插补预测标签作为推荐模型的目标输出值基于上述目标训练模型对推荐模型进行训练。
该目标训练模型可以为:
Figure PCTCN2019114897-appb-000038
应理解,上述两个目标训练模型为相同的目标训练模型,仅在实现方式上有区别。
需要说明的是,在上述两个目标训练模型中,超参数与第二损失函数相乘仅为示意。也就是说目标训练模型中,超参数也可以设置于第一损失函数之前,即超参数可以与第一损失函数相乘。或者,在目标训练模型中,可以设置两个超参数,两个超参数分别设置于第一损失函数之前和第二损失函数之前,基于该目标训练模型进行训练的方法可以称为无倾向双鲁棒(propensity-free doubly robust)法。
通过设置不同的超参数能够调整第一损失函数和第二损失函数在目标训练模型中所占的比重,进一步提高推荐模型的准确率。当插补模型的准确率较高时,第二损失函数的权重可以高于第一损失函数的权重,也就是ω的值可以大于1。例如,用于训练插补模型的第二训练样本的数量较多时,该第二训练样本相对具有代表性,使插补模型能够更准确地拟合第二训练样本分布,得到的插补模型的准确率较高,在该情况下,第二损失函数的权重可以高于第一损失函数的权重,也就是ω的值可以大于1。这样,基于该目标训练模型对推荐模型进行训练,能够提高推荐模型的准确率。
具体地,推荐模型可以为低秩模型。例如,推荐模型可以为矩阵分解(matrix factorization,MF)模型、因子分解机(factorization machine,FM)或FFM等。
这样在求解的过程中可以把训练样本分解为推荐请求中的用户的属性信息和推荐对象的信息两部分,有助于降低计算训练样本的时间复杂度。
需要说明的是,用于插补模型的训练特征可以和用于推荐模型的训练特征不同。训练特征可以从用户的属性信息和推荐对象的信息中确定。例如,用户的属性信息可以包括用户的性别和用户的职业;推荐对象的信息可以包括推荐对象的类型、推荐对象的打分和推荐对象的评论。用于插补模型的训练特征可以包括推荐对象的类型和用户性别。用于推荐模型的训练特征可以包括推荐对象的类型、推荐对象的打分、推荐对象的评论、用户性别和用户的职业。
根据本申请实施例的方案,第二训练样本为没有偏置的训练样本,利用第二训练样本对插补模型进行训练,可以避免偏置问题对插补模型的训练带来的影响,提高插补模型的准确率,使得到的插补预测标签更加准确,进而利用更准确的插补预测标签对推荐模型进行训练,能够提高推荐模型的准确性。
此外,利用第一训练样本和第三训练样本一起对推荐模型进行训练,兼顾了插补模型得到的插补预测标签和实际的样本标签在训练过程中的作用,避免了推荐模型的准确性仅依赖于插补预测标签的准确率,进一步提高推荐模型的准确性。
在第一推荐对象没有被展示给第一用户的情况下,通过为第一训练样本增加插补预测标签,能够将没有发生过的事实纳入建模中,与发生过的事实一起用于推荐模型的训练,也就是将没有样本标签的第一训练样本与有样本标签的第三训练样本一起用于推荐模型的训练,可以使样本分布更加合理,提高推荐模型的准确性。
在第一推荐对象为没有展示给用户的推荐对象的情况下,无法根据用户的操作动作得到第一训练样本对应的样本标签,也就无法利用第一训练样本对推荐模型进行训练。将第一训练样本纳入建模中也就是利用反事实的学习方法对推荐模型进行训练,反事实学习是指将过去未发生的事实进行表征,纳入建模过程中的方法,在本申请实施例中,第一训练样本可以理解为过去未发生的事实。利用反事实学习的方法,将未被展示给用户的推荐对象纳入训练样本中,使样本分布更加合理,进而对推荐模型进行训练,减轻偏置问题带来的影响,提高了推荐模型的准确度。
图6示出了本申请实施例提供的一种推荐模型的训练方法400。方法400包括步骤410至步骤440。下面对步骤410至步骤440进行详细介绍。应理解,步骤410至步骤440的具体实现方式可以参照前述方法300,为了避免不必要的重复,下面在介绍方法400时适当省略重复的描述。
410,获取第一训练样本、第二训练样本和第三训练样本。
其中,第一训练样本、第二训练样本和第三训练样本可以为多个。
第一训练样本包括第一用户的属性信息和第一推荐对象的信息。第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的。第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及第三训练样本的样本标签,第三训练样本的样本标签用于表示第三用户是否对第三推荐对象有操作动作。
第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
上述多个第二训练样本可以为多个第三训练样本中的一部分。也就是说多个第三训练样本中可以包括没有偏置的训练样本和有偏置的训练样本。
420,根据第二训练样本对插补模型进行训练,得到插补模型。
430,通过插补模型对第一训练样本进行处理,得到第一训练样本的插补预测标签。
440,以第一训练样本中的第一用户的属性信息和第一推荐对象的信息以及第三训练样本中的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签和第三训练样本的样本标签作为推荐模型的目标输出值基于目标训练 模型对推荐模型进行训练,得到训练后的推荐模型。
其中,目标训练模型可以为:
Figure PCTCN2019114897-appb-000039
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000040
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000041
为所述第一训练样本,
Figure PCTCN2019114897-appb-000042
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000043
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000044
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000045
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
根据本申请实施例的方案,采用第二训练样本训练插补模型,也就是采用没有偏置的训练样本训练插补模型,可以避免偏置问题对插补模型的训练带来的影响,提高插补模型的准确率,使得到的插补预测标签更加准确。此外,在第一推荐对象为没有展示给用户的推荐对象的情况下,无法得到第一训练样本的样本标签。通过插补模型为第一训练样本补充对应的插补预测标签,将没有展示给用户的推荐对象纳入训练样本中,也就是利用反事实的学习方法将没有发生的事实纳入建模中对推荐模型进行训练,使样本分布更加合理。利用无倾向双鲁棒法对推荐模型进行训练,只需少量的第二训练样本就能减轻偏置问题带来的影响,提升推荐模型的准确率,避免由于大规模采集第二训练样本而大规模随机展示推荐对象,导致系统整体收入下降。
通过负对数损失(negative logarithmic loss,NLL)和观测者操作特性(receiver operating characteristic,ROC)曲线下的面积(area under the ROC curve,AUC)两个指标对现有方法训练得到的推荐模型以及本申请提出的推荐模型进行测试,本申请实施例中的推荐模型的准确率相对于现有的二分类建模的推荐模型有10%以上的提升。
除了图6所描述的训练方案,本申请的训练方式在实现时,还可以是,获取第一训练样本和第二训练样本,其中第一训练样本为通过插补模型进行标签预测的样本,插补预测模型为预先训练生成的,其训练方式和上述实施例中的训练方式相同,再此不再赘述,基于第一训练样本和第三训练样本训练获取推荐模型。
图7示出了本申请实施例提供的一种推荐框架500示意图。推荐框架500中包括插补模块501和推荐模块502。其中,可以利用插补模块对不具备样本标签的训练样本进行处理,得到插补预测标签,将不具备样本标签的训练样本纳入到建模中,使样本分布更加合理,消除有偏置问题对推荐结果准确率的影响,得到更准确的推荐模块502。
需要说明的是,插补模块501可以对应于图5或图6中的插补模型,推荐模块502可以对应于图5或图6中的推荐模型。
插补模块501可以用于为没有样本标签的训练样本补充插补预测标签。
推荐模块502可以用于预测训练样本中的用户对训练样本中的推荐对象有操作动作的概率。
示例性地,推荐框架500可以分为两个阶段,训练阶段和推荐阶段。下面分别对训练阶段和推荐阶段进行说明。
训练阶段:
步骤A-1:获取至少一个第一训练数据和至少一个第二训练样本。第一训练样本包括第一用户的属性信息和第一推荐对象的信息。
步骤A-2:通过插补模块601对第一用户的属性信息和第一推荐对象的信息进行处理,获取第一训练样本的插补预测标签,插补预测标签用于表示向第一用户推荐第一推荐对象时,第一用户是否对第一推荐对象有操作动作的预测。
插补模块601的参数是基于第二训练样本进行训练得到的,第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的。
步骤A-3:获取至少一个第三训练样本。该步骤为可选步骤。
第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及所述第三训练样本的样本标签,第三训练样本的样本标签用于表示第三用户是否对第三推荐对象有操作动作。
步骤A-4:以所述第一训练样本的所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到推荐模块502。
在包含步骤A-3的情况下,步骤A-4可以为,根据所述第一训练样本和所述第三训练样本基于目标训练模型进行训练,得到推荐模块502。
目标推荐模型可以为上述步骤330或步骤440中目标推荐模型,此处不再赘述。
示例性地,上述插补模块可以为广告平均CTR模型、逻辑回归模型、FFM或DNN等。
示例性地,上述推荐模块可以为MF、FM或FFM等。
根据本申请实施例的方案,通过插补模型得到第二类训练样本的插补预测标签,进而可以将第二类训练样本和对应的插补预测标签作为一部分训练数据的对推荐模型进行训练。将没有样本标签的第二类训练样本纳入建模中,可以使样本分布更加合理,提高推荐模型的准确性。
推荐阶段:
在推荐阶段,只需要部署推荐模块502,推荐系统构建基于用户的属性信息和推荐对象的信息的输入向量,通过推荐模块502预测用户对推荐对象有操作动作的概率。
图8是本申请实施例提供的推荐方法600的示意图。方法600包括步骤610和步骤620。下面对步骤610至步骤620进行详细介绍。
610,获取目标推荐用户的属性信息和候选推荐对象的信息。
例如,推荐系统收到一条待处理的推荐请求时,基于该待处理的推荐请求可以确定目标推荐用户的属性信息。
示例性地,目标推荐用户的属性信息可以包括用户个性化的一些属性,例如,目标推荐用户的性别、目标推荐用户的年龄、目标推荐用户的职业、目标推荐用户的收入、目标推荐用户的爱好、目标推荐用户的教育情况等。
示例性地,候选推荐对象的信息可以包括候选推荐对象标识,例如候选推荐对象ID。 候选推荐对象的信息还可以包括候选推荐对象的一些属性,例如,候选推荐对象的名称、候选推荐对象的类型等。
620,将目标推荐用户的属性信息和候选推荐对象的信息输入至推荐模型,预测目标推荐用户对候选推荐对象有操作动作的概率。
示例性地,候选推荐对象可以为候选推荐对象集合中的推荐对象。可以根据预测目标推荐用户对候选推荐对象有操作动作的概率对候选推荐集合中的候选推荐对象进行排序,从而得到候选推荐对象的推荐结果。例如,选择概率最高的候选推荐对象展示给用户。比如,候选推荐对象可以是候选推荐应用程序。
如图9所示,图9示出了应用市场中的“推荐”页,该页面上可以有多个榜单,比如,榜单可以包括精品应用和精品游戏。以精品游戏为例,应用市场的推荐系统根据用户的属性信息和候选推荐应用程序的信息预测用户对候选推荐应用程序有下载(安装)行为的概率,并以此概率将候选推荐应用程序降序排列,将最可能被下载的应用程序排在最靠前的位置。
示例性地,在精品应用中推荐结果可以是App5位于精品游戏中的推荐位置一、App6位于精品游戏中的推荐位置二、App7位于精品游戏中的推荐位置三、App8位于精品游戏中的推荐位置四。当用户看到应用市场的推荐结果之后,可以根据自身的兴趣爱好对上述推荐结果进行操作动作,用户的操作动作执行后会被存入用户行为日志中。
例如,图9所示的应用市场可以通过用户行为日志作为训练数据训练推荐模型。
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
推荐模型可以是图7中的推荐模块501,推荐模型的训练方法可以采用图5或图6所示的训练方法以及图7的训练阶段的方法,此处不再赘述。
推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练得到的。第一训练样本的插补预测标签是通过插补模型对第一用户的属性信息和第一推荐对象的信息进行处理得到的,插补预测标签用于表示向第一用户推荐第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测,插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的。
可选地,推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的,包括所述推荐模型的模型参数是以所述第一用户的属性信息和所述第一推荐对象的信息以及第三训练样本的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型得到的,其中,所述第 三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作。
可选地,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
可选地,目标训练模型包括第一损失函数和第二损失函数,第一损失函数用于指示第一训练样本的插补预测标签与第一训练样本的预测标签之间的差异,第二损失函数用于指示第三训练样本的样本标签与第三训练样本的预测标签之间的差异。
可选地,目标训练模型为:
Figure PCTCN2019114897-appb-000046
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000047
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000048
为所述第一训练样本,
Figure PCTCN2019114897-appb-000049
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000050
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000051
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000052
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
可选地,插补模型是根据所述第二训练样本的数量选择的。
下面结合附图对本申请实施例的训练装置和推荐装置进行详细的描述,应理解,下面描述的推荐装置能够执行前述本申请实施例的推荐模型的训练方法,推荐装置可以执行前述本申请实施例的推荐方法,为了避免不必要的重复,下面在介绍本申请实施例的推荐装置时适当省略重复的描述。
图10是本申请实施例的推荐模型的训练装置的示意性框图。图10所示的推荐模型的训练装置700包括获取单元710和处理单元720。
获取单元710和处理单元720可以用于执行本申请实施例的推荐模型的训练方法,具体地,获取单元710可以执行上述步骤310或步骤410,处理单元720可以执行上述步骤320至步骤330或步骤420至步骤440。
获取单元710用于获取至少一个第一训练样本,所述第一训练样本包括第一用户的属性信息和第一推荐对象的信息。处理单元720用于通过插补模型对第一用户的属性信息和第一推荐对象的信息进行处理,获取第一训练样本的插补预测标签,插补预测标签用于表示向第一用户推荐第一推荐对象时,第一用户是否对第一推荐对象有操作动作的预测;其中,插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,至少一个第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,第二训练样本的样本标签用于表示第二用户是否对第二推荐对象有操作动作,第二训练样本是在当第二推荐对象为随机展示给第二用户的情况下获得的。处理单元720还用于以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以第一训练样本的插补预测标签作为推荐模型的目标输出值进行训练,得到训练后的推荐模型。
可选地,作为一个实施例,获取单元710还用于获取至少一个第三训练样本,第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及第三训练样本的样本标签,第三训练样本的样本标签用于表示第三用户是否对第三推荐对象有操作动作。处理单元720 还用于以第一用户的属性信息和第一推荐对象的信息以及第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以第一训练样本的插补预测标签和第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型进行训练,得到训练后的推荐模型。
可选地,作为一个实施例,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
可选地,作为一个实施例,目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
可选地,作为一个实施例,目标训练模型为:
Figure PCTCN2019114897-appb-000053
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000054
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000055
为所述第一训练样本,
Figure PCTCN2019114897-appb-000056
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000057
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000058
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000059
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
可选地,作为一个实施例,插补模型是根据所述第二训练样本的数量选择的。
图11是本申请实施例提供的推荐装置800的示意性框图。图11所示的推荐装置800包括获取单元810和处理单元820。
获取单元810和处理单元820可以用于执行本申请实施例的推荐方法,具体地,获取单元810可以执行上述步骤610,处理单元820可以执行上述步骤620。
获取单元810用于获取目标推荐用户的属性信息和候选推荐对象的信息;处理单元820用于将所述目标推荐用户的属性信息和所述候选推荐对象的信息输入至推荐模型,预测所述目标推荐用户对所述候选推荐对象有操作动作的概率。推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的;所述第一训练样本的插补预测标签是通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理得到的,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测,所述插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的。
可选地,作为一个实施例,推荐模型的模型参数是通过以第一训练样本的第一用户的 属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的,包括:推荐模型的模型参数是以所述第一用户的属性信息和所述第一推荐对象的信息以及第三训练样本的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型得到的,其中,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作。
可选地,作为一个实施例,第一训练样本可以为当第一推荐对象没有被展示给第一用户的情况下获得的,第三训练样本可以为当第三推荐对象被展示给第三用户的情况下获得的。
可选地,作为一个实施例,所述目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一类训练样本的样本标签与所述第一类训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第二类训练样本的插补预测标签与所述第二类训练样本的预测标签之间的差异。
可选地,作为一个实施例,目标训练模型为:
Figure PCTCN2019114897-appb-000060
其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
Figure PCTCN2019114897-appb-000061
中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
Figure PCTCN2019114897-appb-000062
为所述第一训练样本,
Figure PCTCN2019114897-appb-000063
表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
Figure PCTCN2019114897-appb-000064
表示训练样本x l的预测标签,
Figure PCTCN2019114897-appb-000065
表示所述第二损失函数,
Figure PCTCN2019114897-appb-000066
表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
可选地,作为一个实施例,所述插补模型是根据所述第二训练样本的数量选择的。
需要说明的是,上述训练装置700以及装置800以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图12是本申请实施例提供的一种推荐模型的训练装置的硬件结构示意图。图12所示的训练装置900(该装置900具体可以是一种计算机设备)包括存储器901、处理器902、通信接口903以及总线904。其中,存储器901、处理器902、通信接口903通过总线1004实现彼此之间的通信连接。
存储器901可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器901可以存储程序,当存储器901中存储的程序被处理器902执行时,处理器902用于执行本申请实施例的推荐模型的训练方法的各个步骤,例如,执行图5或图6所示的各个步骤。
应理解,本申请实施例所示的训练装置可以是服务器,例如,可以是云端的服务器,或者,也可以是配置于云端的服务器中的芯片。
处理器902可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的推荐模型的训练方法。
处理器902还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的推荐模型的训练方法的各个步骤可以通过处理器902中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器902还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器901,处理器902读取存储器901中的信息,结合其硬件完成本申请实施中图9所示的训练装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的图5或图6所示的推荐模型的训练方法。
通信接口903使用例如但不限于收发器一类的收发装置,来实现训练装置900与其他设备或通信网络之间的通信。
总线904可包括在训练装置900各个部件(例如,存储器901、处理器902、通信接口903)之间传送信息的通路。
图13是本申请实施例提供的推荐装置的硬件结构示意图。图13所示的推荐装置1000(该装置1000具体可以是一种计算机设备)包括存储器1001、处理器1002、通信接口1003以及总线004。其中,存储器1001、处理器1002、通信接口1003通过总线1004实现彼此之间的通信连接。
存储器1001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1001可以存储程序,当存储器1001中存储的程序被处理器1002执行时,处理器1002用于执行本申请实施例的推荐方法的各个步骤,例如,执行图8所示的各个步骤。
应理解,本申请实施例所示的装置可以是智能终端,或者,也可以是配置于智能终端中的芯片。
处理器1002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics  processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请方法实施例的预测选择概率的方法。
处理器1002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的预测选择概率的方法的各个步骤可以通过处理器1002中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器1002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1001,处理器1002读取存储器1001中的信息,结合其硬件完成本申请实施中图10所示的装置中包括的单元所需执行的功能,或者,执行本申请方法实施例的图8所示的推荐方法。
通信接口1003使用例如但不限于收发器一类的收发装置,来实现装置1000与其他设备或通信网络之间的通信。
总线1004可包括在装置1000各个部件(例如,存储器1001、处理器1002、通信接口1003)之间传送信息的通路。
应注意,尽管上述训练装置900和装置1000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,训练装置900和装置1000还可以包括实现正常运行所必须的其他器件。同时,根据具体需要本领域的技术人员应当理解,上述训练装置900和装置1000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,上述训练装置900和装置1000也可仅仅包括实现本申请实施例所必需的器件,而不必包括图12或图13中所示的全部器件。
还应理解,本申请实施例中,该存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。处理器的一部分还可以包括非易失性随机存取存储器。例如,处理器还可以存储设备类型的信息。
应理解,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:通用串行总线闪存盘(USB flash disk,UFD),UFD也可以简称为U盘或者优盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种推荐模型的训练方法,其特征在于,包括:
    获取至少一个第一训练样本,所述第一训练样本包括第一用户的属性信息和第一推荐对象的信息;
    通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理,获取所述第一训练样本的插补预测标签,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测;
    其中,所述插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及所述第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的;
    以所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练,得到训练后的推荐模型。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    获取至少一个第三训练样本,所述第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及所述第三训练样本的样本标签,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作,以及
    所述以所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练,得到训练后的推荐模型,包括:
    以所述第一用户的属性信息和所述第一推荐对象的信息以及所述第三用户的属性信息和所述第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型进行训练,得到训练后的推荐模型。
  3. 如权利要求2所述的方法,其特征在于,所述目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
  4. 如权利要求3所述的方法,其特征在于,所述目标训练模型为:
    Figure PCTCN2019114897-appb-100001
    其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
    Figure PCTCN2019114897-appb-100002
    中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
    Figure PCTCN2019114897-appb-100003
    为所述第一训练样本,
    Figure PCTCN2019114897-appb-100004
    表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插 补预测标签σ(x l),y l表示训练样本x l的样本标签,
    Figure PCTCN2019114897-appb-100005
    表示训练样本x l的预测标签,
    Figure PCTCN2019114897-appb-100006
    表示所述第二损失函数,
    Figure PCTCN2019114897-appb-100007
    表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
  5. 如权利要求1至4任一项所述的方法,其特征在于,所述插补模型是根据所述第二训练样本的数量选择的。
  6. 一种推荐方法,其特征在于,包括:
    获取目标推荐用户的属性信息和候选推荐对象的信息;
    将所述目标推荐用户的属性信息和所述候选推荐对象的信息输入至推荐模型,预测所述目标推荐用户对所述候选推荐对象有操作动作的概率;
    其中,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的;所述第一训练样本的插补预测标签是通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理得到的,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测,所述插补模型的模型参数是基于第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的。
  7. 如权利要求6所述的方法,其特征在于,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的,包括
    所述推荐模型的模型参数是以所述第一用户的属性信息和所述第一推荐对象的信息以及第三训练样本的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型得到的,其中,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作。
  8. 如权利要求7所述的方法,其特征在于,所述目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
  9. 如权利要求8所述的方法,其特征在于,所述目标训练模型为:
    Figure PCTCN2019114897-appb-100008
    其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
    Figure PCTCN2019114897-appb-100009
    中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
    Figure PCTCN2019114897-appb-100010
    为所述第一训练样本,
    Figure PCTCN2019114897-appb-100011
    表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
    Figure PCTCN2019114897-appb-100012
    表示训练样本x l的预测标签,
    Figure PCTCN2019114897-appb-100013
    表示所述第二损失函数,
    Figure PCTCN2019114897-appb-100014
    表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
  10. 如权利要求6至9任一项所述的方法,其特征在于,所述插补模型是根据所述第二训练样本的数量选择的。
  11. 一种推荐模型的训练装置,其特征在于,包括:
    获取单元,用于获取至少一个第一训练样本,所述第一训练样本包括第一用户的属性信息和第一推荐对象的信息;
    处理单元,通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理,获取所述第一训练样本的插补预测标签,所述插补预测标签是用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测;
    其中,所述插补模型的模型参数是基于至少一个第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的;以及
    所述处理单元,还用于以所述第一训练样本的所述第一用户的属性信息和所述第一推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型目标输出值进行训练,得到训练后的推荐模型。
  12. 如权利要求11所述的训练装置,其特征在于,所述获取单元还用于
    获取至少一个第三训练样本,所述第三训练样本包括第三用户的属性信息和第三推荐对象的信息以及所述第三训练样本的样本标签,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作,以及所述处理单元用于
    以所述第一用户的属性信息和所述第一推荐对象的信息以及所述第三用户的属性信息和所述第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型进行训练,得到训练后的推荐模型。
  13. 如权利要求12所述的训练装置,其特征在于,所述目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
  14. 如权利要求13所述的训练装置,其特征在于,所述目标训练模型为:
    Figure PCTCN2019114897-appb-100015
    其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
    Figure PCTCN2019114897-appb-100016
    中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
    Figure PCTCN2019114897-appb-100017
    为所述第一训练样本,
    Figure PCTCN2019114897-appb-100018
    表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
    Figure PCTCN2019114897-appb-100019
    表示训练样本x l的预测标签,
    Figure PCTCN2019114897-appb-100020
    表示所述第二损失函数,
    Figure PCTCN2019114897-appb-100021
    表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
  15. 如权利要求11至14任一项所述的训练装置,其特征在于,所述插补模型是根据所述第二训练样本的数量选择的。
  16. 一种推荐装置,其特征在于,包括:
    获取单元,用于获取目标推荐用户的属性信息和候选推荐对象的信息;
    处理单元,用于将所述目标推荐用户的属性信息和所述候选推荐对象的信息输入至推荐模型,预测所述目标推荐用户对所述候选推荐对象有操作动作的概率;
    其中,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的;所述第一训练样本的插补预测标签是通过插补模型对所述第一用户的属性信息和所述第一推荐对象的信息进行处理得到的,所述插补预测标签用于表示向所述第一用户推荐所述第一推荐对象时,所述第一用户是否对所述第一推荐对象有操作动作的预测,所述插补模型的模型参数是基于第二训练样本进行训练得到的,所述第二训练样本包括第二用户的属性信息和第二推荐对象的信息以及第二训练样本的样本标签,所述第二训练样本的样本标签用于表示所述第二用户是否对所述第二推荐对象有操作动作,所述第二训练样本是在当所述第二推荐对象为随机展示给所述第二用户的情况下获得的。
  17. 如权利要求16所述的推荐装置,其特征在于,所述推荐模型的模型参数是通过以第一训练样本的第一用户的属性信息和第一推荐对象的信息作为所述推荐模型的输入,以所述第一训练样本的插补预测标签作为所述推荐模型的目标输出值进行训练得到的,包括,
    所述推荐模型的模型参数是以所述第一用户的属性信息和所述第一推荐对象的信息以及第三训练样本的第三用户的属性信息和第三推荐对象的信息作为推荐模型的输入,以所述第一训练样本的插补预测标签和所述第三训练样本的样本标签作为所述推荐模型的目标输出值基于目标训练模型得到的,其中,所述第三训练样本的样本标签用于表示所述第三用户是否对所述第三推荐对象有操作动作。
  18. 如权利要求17所述的推荐装置,其特征在于,所述目标训练模型包括第一损失函数和第二损失函数,所述第一损失函数用于指示所述第一训练样本的插补预测标签与所述第一训练样本的预测标签之间的差异,所述第二损失函数用于指示所述第三训练样本的样本标签与所述第三训练样本的预测标签之间的差异。
  19. 如权利要求18所述的推荐装置,其特征在于,所述目标训练模型为:
    Figure PCTCN2019114897-appb-100022
    其中,W为所述推荐模型的参数,R(W)为正则项,λ表示决定正则项权重的超参,训练样本集
    Figure PCTCN2019114897-appb-100023
    中的训练样本x 1至训练样本x L为所述第三训练样本,训练样本x L+1至训练样本
    Figure PCTCN2019114897-appb-100024
    为所述第一训练样本,
    Figure PCTCN2019114897-appb-100025
    表示所述训练样本集中的训练样本的数量,L表示所述训练样本集中的所述第三训练样本的数量,σ l表示训练样本x l的插补预测标签σ(x l),y l表示训练样本x l的样本标签,
    Figure PCTCN2019114897-appb-100026
    表示训练样本x l的预测标签,
    Figure PCTCN2019114897-appb-100027
    表示所述第二损失函数,
    Figure PCTCN2019114897-appb-100028
    表示所述第一损失函数,ω为超参数,用于调节所述第一损失函数和所述第二损失函数的比重。
  20. 如权利要求16至19任一项所述的推荐装置,其特征在于,所述插补模型是根据所述第二训练样本的数量选择的。
  21. 一种推荐模型的训练装置,其特征在于,包括至少一个处理器和存储器,所述至少一个处理器与所述存储器耦合,用于读取并执行所述存储器中的指令,以执行如权利要求1至5中任一项所述的训练方法。
  22. 一种推荐装置,其特征在于,包括至少一个处理器和存储器,所述至少一个处理器与所述存储器耦合,用于读取并执行所述存储器中的指令,以执行如权利要求6至10中任一项所述的推荐方法。
  23. 一种计算机可读介质,其特征在于,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要求1至5中任一项所述的训练方法。
  24. 一种计算机可读介质,其特征在于,所述计算机可读介质存储有程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行如权利要求6至10中任一项所述的推荐方法。
PCT/CN2019/114897 2019-10-31 2019-10-31 推荐模型的训练方法、推荐方法、装置及计算机可读介质 WO2021081962A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2019/114897 WO2021081962A1 (zh) 2019-10-31 2019-10-31 推荐模型的训练方法、推荐方法、装置及计算机可读介质
CN201980093319.4A CN113508378A (zh) 2019-10-31 2019-10-31 推荐模型的训练方法、推荐方法、装置及计算机可读介质
EP19949553.2A EP3862893A4 (en) 2019-10-31 2019-10-31 RECOMMENDATION MODEL LEARNING PROCESS, RECOMMENDATION PROCESS, DEVICE, AND COMPUTER READABLE MEDIA
US17/242,588 US20210248651A1 (en) 2019-10-31 2021-04-28 Recommendation model training method, recommendation method, apparatus, and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/114897 WO2021081962A1 (zh) 2019-10-31 2019-10-31 推荐模型的训练方法、推荐方法、装置及计算机可读介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/242,588 Continuation US20210248651A1 (en) 2019-10-31 2021-04-28 Recommendation model training method, recommendation method, apparatus, and computer-readable medium

Publications (1)

Publication Number Publication Date
WO2021081962A1 true WO2021081962A1 (zh) 2021-05-06

Family

ID=75714441

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114897 WO2021081962A1 (zh) 2019-10-31 2019-10-31 推荐模型的训练方法、推荐方法、装置及计算机可读介质

Country Status (4)

Country Link
US (1) US20210248651A1 (zh)
EP (1) EP3862893A4 (zh)
CN (1) CN113508378A (zh)
WO (1) WO2021081962A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112311A (zh) * 2021-05-12 2021-07-13 北京百度网讯科技有限公司 训练因果推断模型的方法、信息提示方法以装置
CN113241193A (zh) * 2021-06-01 2021-08-10 平安科技(深圳)有限公司 药物推荐模型训练方法、推荐方法、装置、设备及介质
CN113411644A (zh) * 2021-05-28 2021-09-17 北京达佳互联信息技术有限公司 样本数据的处理方法、装置、服务器及存储介质
CN113505230A (zh) * 2021-09-10 2021-10-15 明品云(北京)数据科技有限公司 一种承揽服务推荐方法及系统
CN113592593A (zh) * 2021-07-29 2021-11-02 平安科技(深圳)有限公司 序列推荐模型的训练及应用方法、装置、设备及存储介质
CN114792173A (zh) * 2022-06-20 2022-07-26 支付宝(杭州)信息技术有限公司 预测模型训练方法和装置
WO2023011382A1 (zh) * 2021-07-31 2023-02-09 华为技术有限公司 推荐方法、推荐模型训练方法及相关产品

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291266B (zh) * 2020-02-13 2023-03-21 深圳市雅阅科技有限公司 基于人工智能的推荐方法、装置、电子设备及存储介质
CN113591986B (zh) * 2021-07-30 2024-06-04 阿里巴巴创新公司 用于生成推荐模型的对象权值的方法和个性化推荐方法
CN113516522B (zh) * 2021-09-14 2022-03-04 腾讯科技(深圳)有限公司 媒体资源推荐方法、多目标融合模型的训练方法及装置
CN114283350B (zh) * 2021-09-17 2024-06-07 腾讯科技(深圳)有限公司 视觉模型训练和视频处理方法、装置、设备及存储介质
CN113988291B (zh) * 2021-10-26 2024-06-04 支付宝(杭州)信息技术有限公司 用户表征网络的训练方法及装置
CN116055074A (zh) * 2021-10-27 2023-05-02 北京字节跳动网络技术有限公司 管理推荐策略的方法和装置
CN113742600B (zh) * 2021-11-05 2022-03-25 北京达佳互联信息技术有限公司 资源推荐方法、装置、计算机设备及介质
CN114036389B (zh) * 2021-11-17 2023-05-05 北京百度网讯科技有限公司 对象推荐方法、推荐模型的训练方法及装置
CN114491283B (zh) * 2022-04-02 2022-07-22 浙江口碑网络技术有限公司 对象推荐方法、装置及电子设备
CN114707041B (zh) * 2022-04-11 2023-12-01 中国电信股份有限公司 消息推荐方法、装置、计算机可读介质及电子设备
CN117251487A (zh) * 2022-06-08 2023-12-19 华为技术有限公司 一种项目推荐方法及其相关设备
CN117436540A (zh) * 2022-07-15 2024-01-23 华为技术有限公司 一种模型的训练方法的相关装置
KR102545575B1 (ko) * 2022-07-21 2023-06-21 (주)시큐레이어 고객군별 특성에 따른 이중화 서비스 플로우를 적용한 플랫폼을 통한 ai모델 자동추천 구독 서비스 방법 및 서버
CN115687431A (zh) * 2022-09-02 2023-02-03 国家食品安全风险评估中心 一种基于元路径的食品安全政策推荐方法、装置及设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653683A (zh) * 2015-12-30 2016-06-08 东软集团股份有限公司 一种个性化推荐方法及装置
CN108510326A (zh) * 2018-03-29 2018-09-07 北京小米移动软件有限公司 初始值确定方法及装置
CN109345302A (zh) * 2018-09-27 2019-02-15 腾讯科技(深圳)有限公司 机器学习模型训练方法、装置、存储介质和计算机设备
CN109460513A (zh) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 用于生成点击率预测模型的方法和装置
CN110097412A (zh) * 2018-01-31 2019-08-06 阿里巴巴集团控股有限公司 物品推荐方法、装置、设备以及存储介质
US20190311114A1 (en) * 2018-04-09 2019-10-10 Zhongan Information Technology Service Co., Ltd. Man-machine identification method and device for captcha
CN110335064A (zh) * 2019-06-05 2019-10-15 平安科技(深圳)有限公司 产品推送方法、装置、计算机设备和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10825554B2 (en) * 2016-05-23 2020-11-03 Baidu Usa Llc Methods of feature extraction and modeling for categorizing healthcare behavior based on mobile search logs
CN108108821B (zh) * 2017-12-29 2022-04-22 Oppo广东移动通信有限公司 模型训练方法及装置
CN108062573A (zh) * 2017-12-29 2018-05-22 广东欧珀移动通信有限公司 模型训练方法及装置
US11604844B2 (en) * 2018-11-05 2023-03-14 Samsung Electronics Co., Ltd. System and method for cross-domain recommendations
CN109902708B (zh) * 2018-12-29 2022-05-10 华为技术有限公司 一种推荐模型训练方法及相关装置
CN110046952B (zh) * 2019-01-30 2021-12-10 创新先进技术有限公司 一种推荐模型的训练方法及装置、一种推荐方法及装置
CN110363346A (zh) * 2019-07-12 2019-10-22 腾讯科技(北京)有限公司 点击率预测方法、预测模型的训练方法、装置及设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653683A (zh) * 2015-12-30 2016-06-08 东软集团股份有限公司 一种个性化推荐方法及装置
CN110097412A (zh) * 2018-01-31 2019-08-06 阿里巴巴集团控股有限公司 物品推荐方法、装置、设备以及存储介质
CN108510326A (zh) * 2018-03-29 2018-09-07 北京小米移动软件有限公司 初始值确定方法及装置
US20190311114A1 (en) * 2018-04-09 2019-10-10 Zhongan Information Technology Service Co., Ltd. Man-machine identification method and device for captcha
CN109345302A (zh) * 2018-09-27 2019-02-15 腾讯科技(深圳)有限公司 机器学习模型训练方法、装置、存储介质和计算机设备
CN109460513A (zh) * 2018-10-31 2019-03-12 北京字节跳动网络技术有限公司 用于生成点击率预测模型的方法和装置
CN110335064A (zh) * 2019-06-05 2019-10-15 平安科技(深圳)有限公司 产品推送方法、装置、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3862893A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112311A (zh) * 2021-05-12 2021-07-13 北京百度网讯科技有限公司 训练因果推断模型的方法、信息提示方法以装置
CN113112311B (zh) * 2021-05-12 2023-07-25 北京百度网讯科技有限公司 训练因果推断模型的方法、信息提示方法以装置
CN113411644A (zh) * 2021-05-28 2021-09-17 北京达佳互联信息技术有限公司 样本数据的处理方法、装置、服务器及存储介质
CN113411644B (zh) * 2021-05-28 2022-10-04 北京达佳互联信息技术有限公司 样本数据的处理方法、装置、服务器及存储介质
CN113241193A (zh) * 2021-06-01 2021-08-10 平安科技(深圳)有限公司 药物推荐模型训练方法、推荐方法、装置、设备及介质
CN113592593A (zh) * 2021-07-29 2021-11-02 平安科技(深圳)有限公司 序列推荐模型的训练及应用方法、装置、设备及存储介质
CN113592593B (zh) * 2021-07-29 2023-05-30 平安科技(深圳)有限公司 序列推荐模型的训练及应用方法、装置、设备及存储介质
WO2023011382A1 (zh) * 2021-07-31 2023-02-09 华为技术有限公司 推荐方法、推荐模型训练方法及相关产品
CN113505230A (zh) * 2021-09-10 2021-10-15 明品云(北京)数据科技有限公司 一种承揽服务推荐方法及系统
CN114792173A (zh) * 2022-06-20 2022-07-26 支付宝(杭州)信息技术有限公司 预测模型训练方法和装置

Also Published As

Publication number Publication date
EP3862893A4 (en) 2021-12-01
EP3862893A1 (en) 2021-08-11
US20210248651A1 (en) 2021-08-12
CN113508378A (zh) 2021-10-15

Similar Documents

Publication Publication Date Title
WO2021081962A1 (zh) 推荐模型的训练方法、推荐方法、装置及计算机可读介质
US20220198289A1 (en) Recommendation model training method, selection probability prediction method, and apparatus
US20230088171A1 (en) Method and apparatus for training search recommendation model, and method and apparatus for sorting search results
US11995112B2 (en) System and method for information recommendation
WO2020135535A1 (zh) 一种推荐模型训练方法及相关装置
EP4181026A1 (en) Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium
US8392343B2 (en) Estimating probabilities of events in sponsored search using adaptive models
CN106251174A (zh) 信息推荐方法及装置
EP4322031A1 (en) Recommendation method, recommendation model training method, and related product
WO2014193700A1 (en) Social media pricing engine
CN111178949A (zh) 服务资源匹配参考数据确定方法、装置、设备和存储介质
CN111695024A (zh) 对象评估值的预测方法及系统、推荐方法及系统
WO2024041483A1 (zh) 一种推荐方法及相关装置
WO2023185925A1 (zh) 一种数据处理方法及相关装置
WO2023050143A1 (zh) 一种推荐模型训练方法及装置
CN116910357A (zh) 一种数据处理方法及相关装置
CN113393303A (zh) 物品推荐方法、装置、设备及存储介质
WO2024131762A1 (zh) 一种推荐方法及相关设备
Fu et al. Customer churn prediction for a webcast platform via a voting-based ensemble learning model with Nelder-Mead optimizer
US11989243B2 (en) Ranking similar users based on values and personal journeys
US20240070743A1 (en) Systems and methods for recommended sorting of search results for online searching
JP7044821B2 (ja) 情報処理システム、および情報処理方法
Hu et al. A fast linear computational framework for user action prediction in tencent MyApp
CN116957102A (zh) 用于模型训练的方法及计算设备
CN117808528A (zh) 一种数据处理方法及相关装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 19949553.2

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE