CN115687794A

CN115687794A - Student model training method, device, equipment and medium for recommending articles

Info

Publication number: CN115687794A
Application number: CN202211703705.0A
Authority: CN
Inventors: 何向南; 陈钢; 陈佳伟; 冯福利
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-02-03

Abstract

The application provides a student model training method, a device, equipment and a medium for recommending articles, which can be applied to the technical field of knowledge distillation. The method comprises the following steps: grouping the recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference between the popularity of the recommended articles in each group of recommended article sets is less than or equal to a first preset value; outputting an interest value of each recommended item through a teacher model, wherein the interest value represents the probability of the recommended item being operated; sampling recommended articles in each recommended article set to generate at least one sample pair; determining a positive-negative relationship of the sample items in the sample pair based on the interest value; inputting the positive and negative relations into a distillation loss function of the student model, and outputting a distillation loss value; the student model is trained based on distillation loss values.

Description

Student model training method, device, equipment and medium for recommending articles

Technical Field

The application relates to the technical field of knowledge distillation, in particular to a student model training method, device, equipment and medium for recommending articles.

Background

With the continuous development of internet technology, more and more data are generated on the internet, and it becomes more and more important to provide accurate personalized recommendation service for users, so that a recommendation system is developed. In one implementation, the recommendation system first trains a teacher model from the training set and then learns a student model under the supervision of the teacher model during learning of the user preferences, such that the student model understands the user preferences.

In the course of implementing the concept of the present application, the inventors found that there are at least the following problems in the related art: the student model has understanding deviation during learning and training, so that the recommendation accuracy of the student model for recommending articles is low.

Disclosure of Invention

In view of this, the present application provides a student model training method, apparatus, device and medium for recommending items.

One aspect of an embodiment of the present application provides a student model training method for recommending an item, including: grouping a plurality of recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference between the popularity of the recommended articles in each group of the recommended article sets is less than or equal to a first preset value; outputting an interest value of each recommended item through a teacher model, wherein the interest value represents a probability that the recommended item is operated; sampling the recommended articles in each recommended article set to generate at least one sample pair; determining the positive-negative relation of the sample articles in the sample pair based on the interest value; inputting the positive and negative relation into a distillation loss function of the student model, and outputting a distillation loss value; training the student model based on the distillation loss value.

According to an embodiment of the present application, the grouping the recommended items according to their popularity to obtain a plurality of recommended item sets includes: sorting the popularity of the recommended articles according to a first preset sorting rule to generate a first sequence; grouping the recommended articles based on a preset group number and the first sequence to obtain a plurality of groups of recommended article sets, wherein a difference value between the popularity sums of the recommended articles in each group of recommended article sets is smaller than or equal to a second preset value.

According to an embodiment of the present application, the sampling the recommended items in each recommended item set to generate at least one sample pair includes: sorting the interest values of the recommended articles in the recommended article set according to a second preset sorting rule to generate a second sequence; inputting the ranking position of the recommended article in the second sequence into a ranking perception probability distribution function, and outputting the sampling probability of the recommended article; determining a first sample and a second sample in the recommended set of items based on the sampling probability, and generating a pair of samples.

According to an embodiment of the present application, the rank-aware probability distribution function is as follows:

wherein,

indicating one of the recommended items and,

for recommended articles

The probability of sampling of (a) is,

indicating recommended items

The position of the rank in the second sequence,

representing a hyper-parameter.

According to an embodiment of the present application, the determining a positive-negative relationship of the sample item in the sample pair based on the interest value includes: and comparing the interest value of the first sample with the interest value of the second sample according to a preset comparison rule to determine the positive-negative relation of the sample article in the sample pair.

According to an embodiment of the present application, the positive-negative relationship is a relationship between a positive sample and a negative sample determined based on the interest value for the first sample and the second sample, and the distillation loss function is as follows:

wherein,

a distillation loss value representing a recommended set of items;

representing a user set corresponding to the recommended item set;

represents the size of the collection, i.e., the number of users;

representing one user in the collection;

a set of packets is represented that is grouped,

represents one of the packets;

a set of pairs of samples is represented,

representing a user

In that

All of the pairs of samples in the set are,

respectively representing the assignment of positive and negative samples in a sample pair;

representing a logarithmic function;

it is shown that the activation function is,

representing a user

Or recommended articles

Is used to represent the vector of (a),

is that

The transposing of (1).

Another aspect of the embodiments of the present application provides an item recommendation method, including: acquiring a data set, wherein the data set comprises a plurality of user data and article data corresponding to each user; inputting the data set into a student model obtained by training through the student model training method, and outputting a recommended value of each article; and recommending the target item to the user based on the recommendation value.

Another aspect of an embodiment of the present application provides a student model training apparatus for recommending an item, including: the article grouping module is used for grouping a plurality of recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference between the popularity of the recommended articles in each group of recommended article sets is smaller than or equal to a first preset value; an interest output module, configured to output an interest value of each recommended item through a teacher model, where the interest value represents a probability that the recommended item is operated; the sampling module is used for sampling the recommended articles in each recommended article set to generate at least one sample pair; a relationship determination module for determining a positive-negative relationship of the sample items in the sample pair based on the interest value; the loss output module is used for inputting the positive-negative relation into a distillation loss function of the student model and outputting a distillation loss value; and the model training module is used for training the student model based on the distillation loss value.

Another aspect of an embodiment of the present application provides an item recommendation apparatus, including: the system comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a data set, and the data set comprises a plurality of user data and article data corresponding to each user; a recommendation output module for inputting the data set into a student model obtained by training with the student model training method and outputting a recommendation value of each item; and the recommending module is used for recommending the target object to the user based on the recommending value.

Another aspect of an embodiment of the present application provides an electronic device, including: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of the embodiments of the present application provides a computer-readable storage medium storing computer-executable instructions, which when executed, implement the method as described above.

Another aspect of embodiments of the present application provides a computer program product comprising computer executable instructions for implementing a method as described above when executed.

According to the embodiment of the application, the recommended articles are grouped according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, the difference value between the popularity of the recommended articles in each group of recommended article sets is smaller than or equal to a first preset value, and therefore the popularity of the recommended articles in each group of article sets is similar. Recommended articles with similar popularity are divided into a group, so that sampling is more precise when the recommended articles are sampled and learned in the training process of the student model, and the interference of the popularity on the model learning is effectively reduced. The interest value of each recommended article is output through the teacher model, the positive and negative relation of the sample article in the sample pair is further determined, the positive and negative relation shows the preference of the user more intuitively, the student model can learn the preference of the user more accurately and justly when learning under the supervision of the teacher model, and the recommendation accuracy of the student model when the student model is used for article recommendation is effectively improved.

Drawings

The above and other objects, features and advantages of the present application will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary system architecture of a student model training method for recommending items according to an embodiment of the application;

FIG. 2 shows a flow chart of a student model training method for recommending items according to an embodiment of the present application;

FIG. 3 illustrates a training block diagram of a student model training method for recommending items according to an embodiment of the present application;

FIG. 4 shows a flow chart of an item recommendation method according to an embodiment of the application;

FIG. 5 shows a block diagram of a student model training apparatus for recommending items according to an embodiment of the present application;

FIG. 6 shows a block diagram of an item recommendation device according to an embodiment of the present application;

FIG. 7 shows a block diagram of an electronic device suitable for a student model training method for recommending items according to an embodiment of the application.

Detailed Description

Hereinafter, embodiments of the present application will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present application. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the application. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

In those instances where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). Where a convention analogous to "at least one of A, B, or C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, or C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).

With the generalization of online personalized services, recommendation systems are also becoming more and more important. In a real-time recommendation system, a large model with a large number of parameters has a very high capacity and is therefore proven to have better accuracy. However, its success requires a significant computational and memory cost, which can result in unacceptable delays in the inference phase. The small model has a simple structure and fewer parameters, so that the small model only needs less calculation and memory cost and is easier to deploy to scenes (such as mobile terminals) with fewer calculation resources. But also because of this, the learning ability of small models is often poor and it is difficult to meet practical requirements. The knowledge distillation technology is suitable for small models to have better accuracy while keeping smaller reasoning delay.

Knowledge distillation is applied to the recommendation system for the purpose of reducing the model size while maintaining the model performance. Knowledge distillation first trains a large, structurally complex teacher model from a training set, and then learns a small student model under the supervision of the teacher model. Since knowledge distillation encodes the knowledge learned by the teacher model, the student model can benefit more from supervised learning of the teacher model and achieve better performance than learning directly from training data. Although the prior distillation methods work well, the prior distillation methods have serious deviation phenomena, which leads to the serious deviation of the distillation results towards the articles with high popularity.

In the process of implementing the present application, it is found that the deviation may occur in two stages: one is generated during the training phase of the teacher's model and the other is generated during the distillation phase. A straightforward solution is to intervene in the training process of the teacher's model to generate unbiased distillation data, but it has been found through studies that this does not achieve true unbiased distillation.

In view of the above, the inventors found that it is possible to reduce the bias during the training of the student model in the distillation stage by first dividing the recommended items into a plurality of groups according to their popularity, wherein the recommended items in the same group have similar popularity. The recommended items in the same group are then ranked using the teacher model, and knowledge in each group is used to supervise learning by the student model.

Specifically, the embodiment of the application provides a student model training method, a device, equipment and a medium for recommending articles. A student model training method for recommending items, comprising: grouping the recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference between the popularity of the recommended articles in each group of recommended article sets is less than or equal to a first preset value; outputting an interest value of each recommended item through a teacher model, wherein the interest value represents the probability of the recommended item being operated; sampling recommended articles in each recommended article set to generate at least one sample pair; determining a positive-negative relationship of the sample items in the sample pair based on the interest value; inputting the positive and negative relations into a distillation loss function of the student model, and outputting a distillation loss value; the student model is trained based on distillation loss values.

In the technical scheme of the application, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the related data (including but not limited to user information) meet the requirements of related laws and regulations, necessary security measures are taken, and the official customs is not violated.

Fig. 1 shows an exemplary system architecture 100 to which a student model training method for recommending items may be applied according to an embodiment of the present application. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present application may be applied to help those skilled in the art understand the technical content of the present application, and does not mean that the embodiments of the present application may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

The users may use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or transmit user data and item data corresponding to each user, and the like. The

terminal devices

101, 102, 103 may have various messaging client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, and/or social platform software, to name a few examples.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The backend management server may analyze and process the received data such as the user request, and feed back a processing result (for example, a web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, a student model training method for recommending items provided in the embodiments of the present application may be generally executed by the server 105. Accordingly, a student model training device for recommending articles provided by the embodiment of the present application can be generally disposed in the server 105. The student model training method for recommending articles provided by the embodiment of the application can also be executed by a server or a server cluster which is different from the server 105 and can be communicated with the

terminal devices

101, 102 and 103 and/or the server 105. Correspondingly, the student model training device for recommending articles provided by the embodiment of the application can also be arranged in a server or a server cluster which is different from the server 105 and can be communicated with the

terminal devices

101, 102 and 103 and/or the server 105. Alternatively, the student model training method for recommending articles provided by the embodiment of the present application may also be executed by the

terminal device

101, 102, or 103, or may also be executed by another terminal device different from the

terminal device

101, 102, or 103. Correspondingly, the student model training device for recommending articles provided by the embodiment of the application can also be arranged in the

terminal device

101, 102 or 103, or in other terminal devices different from the

terminal device

101, 102 or 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be regarded as representing the execution sequence of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.

FIG. 2 shows a flowchart of a student model training method for recommending items according to an embodiment of the application.

As shown in FIG. 2, the method includes operations S201-S206.

In operation S201, a plurality of recommended items are grouped according to the popularity of the plurality of recommended items, so as to obtain a plurality of groups of recommended item sets, where a difference between the popularity of the recommended items in each group of recommended item sets is less than or equal to a first preset value.

In operation S202, an interest value of each recommended item is output through the teacher model, where the interest value characterizes a probability that the recommended item is operated.

In operation S203, recommended items in each set of recommended items are sampled, and at least one sample pair is generated.

In operation S204, a positive-negative relationship of the sample item in the sample pair is determined based on the interest value.

In operation S205, the positive-negative relationship is input into the distillation loss function of the student model, and a distillation loss value is output.

In operation S206, the student model is trained based on the distillation loss value.

According to an embodiment of the application, the recommended item is an item that was sent or displayed to the user. For example, recommended items include items that the user has clicked and browsed through, and also items that the user has not clicked through but browsed through. The popularity of the recommended item may be obtained directly from a database or from a cloud server. The popularity can be determined based on the operation times of the user, and the popularity is higher when the operation times of the recommended articles are higher; the popularity may also be determined based on the number of search results generated when the recommended item is searched, the greater the number of search results, the greater the popularity. The popularity may be expressed as a percentage or may be limited by a fixed range, for example, the popularity may range from 1 to 10.

According to an embodiment of the present application, the number of recommended items in each recommended item set may be the same or different, and is not limited herein. The difference between the popularity of the recommended items in each group of recommended item sets is less than or equal to a first preset value to ensure that recommended items with similar popularity are assigned to the same group. The first preset value is set based on the training precision, and the smaller the first preset value is, the higher the training precision is. For example, the first preset value may be set to 2. Recommended articles with similar popularity are divided into a group, so that the recommended articles are finer in the sample extraction process, and the interference of the popularity on model learning is effectively reduced.

According to the embodiment of the application, each student model is supervised by a teacher model in the learning process. A large amount of user data and article data corresponding to each user are input into a teacher model, and the teacher model outputs interest values of each article. The user data includes the ID or IP address of the user, and the item data may be movies and music. The interest value may represent a probability that the item was purchased, or a probability that the item was clicked. Wherein the interest value may range from 0 to 1. The interest value is output through the teacher model, and then the student model learns the interest value, so that the student model can learn the preference of the user visually.

According to an embodiment of the application, recommended items in each set of recommended items are sampled, and at least one sample pair is generated. The sample pair may represent two sample items, or may represent a plurality of sample items, which is not limited herein. Sample pairs are generated through sampling, and the popularity between the sample pairs is similar, so that the student model is more refined when learning the preference relationship between each sample item.

According to the embodiment of the application, the positive and negative relations of the sample items of the sample pairs are determined based on the interest values, so that the student model is more accurate and fair when learning the positive and negative relations. And determining positive samples of the sample items with high interest values in the sample pairs, determining negative samples of the sample items with low interest values in the sample pairs, wherein the positive-negative relation refers to the relation between the positive samples and the negative samples. The positive and negative samples may be assigned, for example, if the positive sample is assigned as 1 and the negative sample is assigned as-1, the positive-negative relationship refers to the relationship between 1 and-1. The positive and negative relation intuitively reflects the preference relation of the user to the sample article in the sample pair.

According to the embodiment of the application, the recommended articles are grouped according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, and the difference value between the popularity of the recommended articles in each group of recommended article sets is smaller than or equal to the first preset value, so that the popularity of the recommended articles in each group of article sets is similar. Recommended articles with similar popularity are divided into a group, so that sampling is more precise when the recommended articles are sampled and learned in the training process of the student model, and the interference of the popularity on the model learning is effectively reduced. The interest value of each recommended article is output through the teacher model, the positive and negative relation of the sample article in the sample pair is further determined, the positive and negative relation shows the preference of the user more intuitively, the student model can learn the preference of the user more accurately and justly when learning under the supervision of the teacher model, and the recommendation accuracy of the student model when the student model is used for article recommendation is effectively improved.

According to an embodiment of the application, grouping a plurality of recommended items according to popularity of the plurality of recommended items to obtain a plurality of sets of recommended items may include the following operations.

Ranking the popularity of the recommended articles according to a first preset ranking rule to generate a first sequence; and grouping the recommended articles based on the preset group number and the first sequence to obtain a plurality of groups of recommended article sets, wherein the difference value between the popularity sums of the recommended articles in each group of recommended article sets is less than or equal to a second preset value.

According to an embodiment of the present application, the first preset ordering rule may be sorted in a descending order or sorted in an ascending order, which is not limited herein. The preset number of groups in this embodiment is an important hyper-parameter for balancing the trade-off between unbiased property and informativeness, and can be set by itself based on training requirements. Where a larger preset number of groups represents a finer grained section, and the recommended items in each group will have a higher similarity in popularity, meaning that it is more likely to remain fair. For example, the preset number of groups may be set to 100 groups. However, a larger number of preset groups reduces the number of recommended items in each group and reduces knowledge about the item ranking relationship, and a smaller number of preset groups may bring more information but sacrifice fairness. For example, the preset number of sets may be set to 50 sets.

According to the embodiment of the application, recommended articles are grouped based on the preset group number and the first sequence, and a plurality of groups of recommended article sets are obtained. And the difference value between the popularity sums of the recommended articles in each group of recommended article sets is less than or equal to a second preset value. The second preset value is set based on the training precision, and the smaller the second preset value is, the higher the training precision is. For example, the second preset value may be 2. When the popularity is grouped, the sum of the popularity of each group is not necessarily the same, and because there may be inseparable cases, it is sufficient to keep the sum of the popularity of each group approximate. Through sorting the popularity of the recommended articles in a descending order and training a student model according to the popularity hierarchical sorting knowledge, the interference of a terrible popularity effect is avoided.

FIG. 3 shows a training block diagram of a student model training method for recommending items according to an embodiment of the present application.

According to an embodiment of the application, sampling recommended items in each set of recommended items to generate at least one sample pair may include the following operations.

Sorting the interest values of the recommended articles in the recommended article set according to a second preset sorting rule to generate a second sequence; inputting the ranking position of the recommended article in the second sequence into a ranking perception probability distribution function, and outputting the sampling probability of the recommended article; a first sample and a second sample are determined in the recommended set of items based on the sampling probabilities, and a pair of samples is generated.

According to the embodiment of the application, as shown in fig. 3, user data and item data corresponding to each user are input into a teacher model for training, the teacher model outputs an interest value of each item, and then recommended items are ranked based on the interest values. In the ranking process, the interest values of the recommended articles in the recommended article set are ranked according to a second preset ranking rule, and a second sequence is generated. The second preset ordering rule may be descending ordering or ascending ordering, and is not limited herein.

According to the embodiment of the application, the recommended articles in the recommended article set are sorted and then grouped and sampled. And in the grouping and sampling process, the ranking positions of the recommended articles in the second sequence are input into the ranking perception probability distribution function, and the sampling probability of the recommended articles is output. In the embodiment of the application, the higher the ranking position is, the higher the sampling probability is. A first sample and a second sample are determined in the recommended set of items based on the sampling probabilities, and a pair of samples is generated. The higher the sampling probability is, the higher the probability of being extracted is, sample articles with high interest values can be extracted as far as possible for learning, the learning efficiency of the student model is effectively improved, and articles which can be operated by a user can be preferentially learned.

According to an embodiment of the application, the rank-aware probability distribution function is as follows:

wherein,

represents one of themThe items are already recommended and,

for recommended articles

The probability of sampling of (a) is,

indicating recommended items

The position of the rank in the second sequence,

indicating a hyper-parameter.

According to an embodiment of the application, determining the positive-negative relationship of the sample items in the sample pair based on the interest value may include the following operations.

And comparing the interest value of the first sample with the interest value of the second sample according to a preset comparison rule to determine the positive-negative relation of the sample article in the sample pair.

According to the embodiment of the application, the preset comparison rule may be to directly compare the numerical values of the two, and may also be to sequence the numerical values and compare the ranking positions, which is not limited herein. And comparing the interest value of the first sample with the interest value of the second sample, determining that the interest value is high as a positive sample, determining that the interest value is low as a negative sample, and assigning values to the positive sample and the negative sample so that the student model is more direct when learning the positive-negative relation of the sample items. For example, a positive sample is assigned a value of 1 and a negative sample is assigned a value of-1.

According to an embodiment of the present application, the positive-negative relationship is a relationship between the positive and negative samples determined based on the value of interest for the first and second samples, and the distillation loss function is as follows:

wherein,

a distillation loss value representing a recommended set of items;

representing a user set corresponding to the recommended item set;

represents the size of the collection, i.e., the number of users;

representing one user in the collection;

a set of packets is represented that is,

represents one of the packets;

a set of pairs of samples is represented,

representing a user

In that

All of the pairs of samples in the set are,

respectively representing the assignment of positive and negative samples in a sample pair; representing a logarithmic function;

it is shown that the activation function is,

representing a user

Or recommended articles

Is used to represent the vector of (a),

is that

The transposing of (1).

According to the embodiment of the application, preference learning is performed on students in one classroom in a school, for example. And inputting the student data and the item data corresponding to each student into the teacher model to obtain the interest value of each item of the student in the class, and generating a positive-negative relation based on the interest values. The student model learns the positive and negative relation again, constantly reduces this student model's deviation through calculating distillation loss value in the learning process to make student model grasp the preference of this class student more accurate. After the student model learns the preferences of the class students, the items can be recommended to the class students more accurately according to the preferences of the students.

According to the embodiment of the application, the vector representation generated by the positive sample assignment in the distillation loss function is subtracted from the vector representation generated by the negative sample assignment, so that the positive and negative relations among sample articles are effectively amplified, and the student model can learn the preference of a user in the learning process more accurately.

According to an embodiment of the application, training a student model based on distillation loss values may include the following operations.

Determining a final loss value of the student model according to the distillation loss value and the original supervision loss value of the student model; the student model is trained based on the final loss value.

According to embodiments of the present application, the distillation loss is typically accompanied by an original supervised loss from the training data, the distillation loss value and the original supervised loss are input into an objective function, and a final loss value is output. The objective function is as follows:

wherein,

a distillation loss value representing a recommended set of items;

representing the original supervised loss value of the student model,

indicating a hyper-parameter.

According to the embodiment of the application, the smaller the final loss value is, the better the final loss value is, the smaller the final loss value is, the better the model training effect is, and the final loss value is ideally close to 0. When the final loss value is not small enough, the student model continues to be trained based on the final loss value. In continuing to train the student model based on the final loss value, the parameters of the student model need to be updated.

According to the embodiments of the present application, use first

For is to

Derivation, i.e.

. Because of the fact that

Is obtained from a student model, so

. Obtained by

For updating

Update the formula as

,

Is a learning rate set in advance by itself. Repeating the above two steps until

Stopping when the change is not generated any more or the change amplitude can be ignored, and finishing the training of the student model at the moment.

According to the embodiment of the application, after the training of the student model is completed, the test data is input into the student model to be detected. As shown in table 1, the test data includes a plurality of user data and article data corresponding to each user, and interaction data of the user and the article and sparsity of the test data.

TABLE 1 test data

According to the embodiment of the application, the distillation method of the student model in the embodiment is compared with the existing distillation method to determine whether the recommended accuracy of the student model is improved, and further determine whether the depolarization effect of the distillation method in the embodiment is obvious. As shown in table 2, the recall rate of the whole of all recommended items is calculated, and then the recommended items are divided into popular groups and non-popular groups according to popularity, and the recall rate of each group is calculated. A preset threshold value exists between the popular group and the unpopular group, the popular group is divided into the popular groups when the threshold value is larger than the preset threshold value, and the unpopular group is not larger than the preset threshold value. Wherein, the preset threshold value can be set according to the actual popularity condition of the article. For example, the preset threshold may be set to a popularity of 5. The recall ratio is the ratio of the number of correctly identified positive samples to the number of all positive samples in the test set. Wherein, the higher the recall rate is, the more accurate the recommendation of the articles by the student model is.

The existing Distillation methods include Ranking Distillation (RD for short), cooperative Distillation (CD for short), distillation Experts and Relaxed Ranking Distillation (DERRD for short), and Topology Distillation (HTD for short). In this example, the Distillation method for the student model is unbiased Knowledge Distillation (DKD).

According to the embodiment of the application, the test data is input into an MF (Matrix Factorization) model and a LightGCN (LightGraph constraint Network lightweight Graph Convolution neural Network) model, and the test results of the test data are output respectively.

Based on experimental results, the overall improvement of the existing distillation process is mainly in the flow set, and the performance of the flow set is not significantly reduced. As shown in table 2, the distillation method DKD on the student model in this example significantly improved the accuracy on the whole as well as on the unpopular group. The following experiment results prove that the distillation method DKD for the student model in the embodiment can effectively relieve the deviation problem in knowledge distillation, so that the student model can generate more accurate and fair recommendation.

TABLE 2 recall ratio of various distillation methods across the entire data set and unpopular groups

FIG. 4 shows a flow chart of an item recommendation method according to an embodiment of the application.

As shown in FIG. 4, the method includes operations S401 to S403.

In operation S401, a data set including a plurality of user data and item data corresponding to each user is acquired.

In operation S402, the data set is input into the trained student model, and a recommended value for each item is output.

In operation S403, the target item is recommended to the user based on the recommendation value.

According to the embodiment of the application, the data set may be acquired from a database, or may be acquired from a cloud server, which is not limited herein. The user data includes the ID or IP address of the user, and the items may be movies and music. The acquired data set is input into a trained student model, which is trained based on operations S201-S206. After the student model is input, the student model scores all articles to generate recommendation values, and then selects the top 10 articles with the highest recommendation values to recommend to the user. Wherein the recommended value may range from 0 to 1.

FIG. 5 shows a block diagram of a student model training apparatus for recommending items according to an embodiment of the application.

As shown in fig. 5, the student model training apparatus 500 for recommending items may include an item grouping module 501, an interest output module 502, a sampling module 503, a relationship determination module 504, a loss output module 505, and a model training module 506.

An article grouping module 501, configured to group a plurality of recommended articles according to popularity of the plurality of recommended articles to obtain a plurality of groups of recommended article sets, where a difference between the popularity of the recommended articles in each group of recommended article sets is less than or equal to a first preset value;

an interest output module 502, configured to output an interest value of each recommended item through a teacher model, where the interest value represents a probability that the recommended item is operated;

a sampling module 503, configured to sample recommended articles in each recommended article set, and generate at least one sample pair;

a relationship determination module 504 for determining a positive-negative relationship of the sample items in the sample pair based on the interest value;

a loss output module 505, configured to input the positive-negative relationship into a distillation loss function of the student model, and output a distillation loss value; and

a model training module 506 for training the student model based on the distillation loss value.

According to an embodiment of the present application, the item grouping module 501 includes a first sequence generating unit and a sequence grouping unit.

And the first sequence generating unit is used for sequencing the popularity of the recommended articles according to a first preset sequencing rule to generate a first sequence.

And the sequence grouping unit is used for grouping the recommended articles based on the preset group number and the first sequence to obtain a plurality of groups of recommended article sets, and the difference value between the popularity sums of the recommended articles in each group of recommended article sets is less than or equal to a second preset value.

According to the embodiment of the present application, the sampling module 503 includes a second sequence generating unit, a probability determining unit, and a sample pair generating unit.

And the second sequence generating unit is used for sequencing the interest values of the recommended articles in the recommended article set according to a second preset sequencing rule to generate a second sequence.

And the probability determining unit is used for inputting the ranking positions of the recommended articles in the second sequence into the ranking perception probability distribution function and outputting the sampling probability of the recommended articles.

A sample pair generation unit for determining a first sample and a second sample in the recommended set of items based on the sampling probability and generating a sample pair.

According to an embodiment of the application, the relationship determination module 504 includes an interest comparison unit.

And the interest comparison unit is used for comparing the interest value of the first sample with the interest value of the second sample according to a preset comparison rule so as to determine the positive-negative relation of the sample article in the sample pair.

According to an embodiment of the application, the positive-negative relationship is a relationship between a positive sample and a negative sample determined by the first sample and the second sample based on the interest value.

According to an embodiment of the application, the model training module 506 comprises a loss determination unit and a loss training unit.

And the loss determining unit is used for determining a final loss value of the student model according to the distillation loss value and the original supervision loss value of the student model.

And the loss training unit is used for training the student model based on the final loss value.

FIG. 6 shows a block diagram of an item recommendation device according to an embodiment of the present application.

As shown in fig. 6, the item recommendation apparatus 600 may include an obtaining module 601, a recommendation output module 602, and a recommendation module 603.

An obtaining module 601, configured to obtain a data set, where the data set includes multiple user data and item data corresponding to each user;

a recommendation output module 602, configured to input the data set into a student model obtained by training using a student model training method, and output a recommendation value of each item;

and a recommending module 603, configured to recommend the target item to the user based on the recommendation value.

Any of the modules and units according to embodiments of the application, or at least part of the functionality of any of them, may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present application may be implemented by being split into a plurality of modules. Any one or more of the modules and units according to the embodiments of the present Application may be implemented at least partially as a hardware Circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a Circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, one or more of the modules and units according to embodiments of the application may be at least partly implemented as computer program modules, which, when executed, may perform corresponding functions.

For example, any plurality of the item grouping module 501, the interest output module 502, the sampling module 503, the relationship determination module 504, the loss output module 505, and the model training module 506, the obtaining module 601, the recommendation output module 602, and the recommendation module 603 may be combined and implemented in one module/unit/subunit, or any one of the modules/units/subunits may be split into a plurality of modules/units/subunits. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present application, at least one of the item grouping module 501, the interest output module 502, the sampling module 503, the relationship determination module 504, the loss output module 505, and the model training module 506, the obtaining module 601, the recommendation output module 602, and the recommendation module 603 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or by a suitable combination of any of them. Alternatively, at least one of the item grouping module 501, the interest output module 502, the sampling module 503, the relationship determination module 504, the loss output module 505, and the model training module 506, the obtaining module 601, the recommendation output module 602, and the recommendation module 603 may be implemented at least in part as computer program modules that, when executed, perform corresponding functions.

It should be noted that, the portion of the student model training device for recommending articles in the embodiment of the present application corresponds to the portion of the student model training method for recommending articles in the embodiment of the present application, and the description of the portion of the student model training device for recommending articles specifically refers to the portion of the student model training method for recommending articles, and is not repeated here.

Fig. 7 shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.

As shown in fig. 7, an electronic device 700 according to an embodiment of the present application includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present application.

In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flows according to the embodiments of the present application by executing programs in the ROM 702 and/or the RAM 703. Note that the program may also be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present application by executing programs stored in the one or more memories.

According to an embodiment of the present application, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a Display panel such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

According to an embodiment of the present application, the method flow according to the embodiment of the present application may be implemented as a computer software program. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the processor 701, performs the above-described functions defined in the system of the embodiment of the present application. According to an embodiment of the present application, the above-described systems, apparatuses, devices, modules, units, etc. may be implemented by computer program modules.

The present application also provides a computer-readable storage medium, which may be contained in the device/apparatus/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the present application.

According to an embodiment of the present application, the computer readable storage medium may be a non-volatile computer readable storage medium. Examples may include, but are not limited to: a portable Computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), a portable compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the preceding. In the context of this application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present application, a computer-readable storage medium may include the ROM 702 and/or the RAM703 and/or one or more memories other than the ROM 702 and the RAM703 described above.

Embodiments of the present application also include a computer program product comprising a computer program containing program code for performing a method provided by embodiments of the present application, which, when the computer program product is run on an electronic device, is configured to cause the electronic device to implement a student model training method for recommending items provided by embodiments of the present application.

When the computer program is executed by the processor 701, the above-described functions defined in the system/apparatus of the embodiment of the present application are performed. According to an embodiment of the present application, the above described systems, devices, modules, units, etc. may be implemented by computer program modules.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

According to embodiments of the present application, program code for executing computer programs provided by embodiments of the present application may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It will be appreciated by a person skilled in the art that various combinations and/or combinations of features described in the various embodiments and/or claims of the present application are possible, even if such combinations or combinations are not explicitly described in the present application. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present application may be made without departing from the spirit and teachings of the present application. All such combinations and/or associations are intended to fall within the scope of the present application.

The embodiments of the present application are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present application. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the application is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present application, and these alternatives and modifications are intended to fall within the scope of the present application.

Claims

1. A student model training method for recommending items, comprising:

grouping a plurality of recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference between the popularity of the recommended articles in each group of recommended article sets is less than or equal to a first preset value;

outputting an interest value of each recommended item through a teacher model, wherein the interest value characterizes a probability that the recommended item is operated;

sampling the recommended items in each set of recommended items to generate at least one sample pair;

determining a positive-negative relationship of sample items in the sample pair based on the interest value;

inputting the positive-negative relation into a distillation loss function of the student model, and outputting a distillation loss value;

training the student model based on the distillation loss value.

2. The method of claim 1, wherein said grouping a plurality of recommended items according to their popularity results in a plurality of sets of recommended items, comprising:

sorting the popularity of the recommended articles according to a first preset sorting rule to generate a first sequence;

grouping the recommended articles based on a preset group number and the first sequence to obtain a plurality of groups of recommended article sets, wherein a difference value between popularity sums of the recommended articles in each group of recommended article sets is smaller than or equal to a second preset value.

3. The method of claim 1, wherein the sampling the recommended items in each of the sets of recommended items, generating at least one sample pair, comprises:

ordering the interest values of the recommended articles in the recommended article set according to a second preset ordering rule to generate a second sequence;

inputting the ranking position of the recommended item in the second sequence into a ranking perception probability distribution function, and outputting the sampling probability of the recommended item;

a first sample and a second sample are determined in the recommended set of items based on the sampling probabilities, and a pair of samples is generated.

4. The method of claim 3, wherein the rank-aware probability distribution function is as follows:

wherein,

indicating one of the recommended items and,

for recommended articles

The probability of sampling of (a) is,

indicating recommended items

The position of the rank in the second sequence,

representing a hyper-parameter.

5. The method of claim 3, wherein said determining a positive-negative relationship for a sample item in the sample pair based on the interest value comprises:

6. The method of claim 5, wherein the positive-negative relationship is a relationship between a positive sample and a negative sample determined by the first sample and the second sample based on the value of interest, and the distillation loss function is as follows:

wherein,

a distillation loss value representing a recommended set of items;

representing a user set corresponding to the recommended item set;

represents the size of the collection, i.e., the number of users;

representing one user in the collection;

a set of packets is represented that is grouped,

represents one of the packets;

a set of pairs of samples is represented,

representing a user

In that

All of the pairs of samples in the set are,

representing a logarithmic function;

it is shown that the activation function is,

representing a user

Or recommended articles

Is used to represent the vector of (a),

is that

The transposing of (1).

7. The method of claim 1, wherein the training of the student model based on the distillation loss values comprises:

determining a final loss value of the student model according to the distillation loss value and an original supervision loss value of the student model;

training the student model based on the final loss value.

8. An item recommendation method comprising:

acquiring a data set, wherein the data set comprises a plurality of user data and article data corresponding to each user;

inputting the data set into a student model obtained by training through the student model training method of any one of claims 1 to 7, and outputting a recommended value of each article;

recommending the target item to the user based on the recommendation value.

9. A student model training apparatus for recommending items, comprising:

the article grouping module is used for grouping the recommended articles according to the popularity of the recommended articles to obtain a plurality of groups of recommended article sets, wherein the difference value between the popularity of the recommended articles in each group of recommended article sets is smaller than or equal to a first preset value;

an interest output module, configured to output an interest value of each recommended item through a teacher model, where the interest value represents a probability that the recommended item is operated;

a sampling module for sampling the recommended items in each set of recommended items to generate at least one sample pair;

a relationship determination module to determine a positive-negative relationship of the sample items in the sample pair based on the interest value;

the loss output module is used for inputting the positive and negative relation into a distillation loss function of the student model and outputting a distillation loss value; and

and the model training module is used for training the student model based on the distillation loss value.

10. An item recommendation device comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a data set, and the data set comprises a plurality of user data and article data corresponding to each user;

a recommendation output module, configured to output the data set into a student model obtained by training according to the student model training method of any one of claims 1 to 7, and output a recommendation value of each item;

and the recommending module is used for recommending the target object to the user based on the recommending value.