CN115186750A

CN115186750A - Model training method, device, equipment and storage medium

Info

Publication number: CN115186750A
Application number: CN202210819009.XA
Authority: CN
Inventors: 陈飞
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-10-14

Abstract

The application discloses a model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: determining a positive training sample and a negative training sample in a training sample set; calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample. Through the method and the device, the recommendation performance and the user experience of the model can be improved, and the intelligence and the diversity of the recommendation of the works are ensured.

Description

Model training method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model training method, a model training apparatus, a computer device, and a computer-readable storage medium.

Background

The recommendation model is widely applied to various internet online services, can help a user to find favorite products, music, books and the like more quickly, and meanwhile, the user experience is directly determined according to the recommendation performance. At present, in a multimedia recommendation scene (such as music works, film and television works and the like), a strong head gathering phenomenon exists, namely popular works are widely spread, and the more popular works are recommended to a user in the process of training a recommendation model, the longer the time consumed by the user is, so that popular head works are more prone to being recommended to the user.

However, the recommendation model trained in this way may cause the middle and long tail (low-heat) works to be exposed less and less, which affects both the user experience and the ecological health of the whole recommendation system. Therefore, how to implement training of a recommendation model to improve user experience and recommendation performance is an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a model training method, a model training device, model training equipment and a storage medium, which can improve the recommendation performance and user experience of a model and ensure the intelligence and diversity of product recommendation.

In one aspect, an embodiment of the present invention provides a model training method, where the method includes:

determining a positive training sample and a negative training sample in a training sample set, wherein the positive training sample comprises first characteristic information of a first object and second characteristic information of an article of interest of the first object, and the negative training sample comprises third characteristic information of a second object and fourth characteristic information of an article of interest of the second object;

calling a first model to process the first feature information and the second feature information to obtain a first feature vector of the first feature information and a second feature vector of the second feature information, and calling the first model to process the third feature information and the fourth feature information to obtain a third feature vector of the third feature information and a fourth feature vector of the fourth feature information;

and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the second model is used for recommending the object to which the object is interested.

In one aspect, an embodiment of the present application provides a model training apparatus, where the apparatus includes:

a determining unit, configured to determine a positive training sample and a negative training sample in a training sample set, where the positive training sample includes first feature information of a first object and second feature information of an item of interest of the first object, and the negative training sample includes third feature information of a second object and fourth feature information of the item of interest of the second object;

a calling unit, configured to call a first model to process the first feature information and the second feature information to obtain a first feature vector of the first feature information and a second feature vector of the second feature information, and call the first model to process the third feature information and the fourth feature information to obtain a third feature vector of the third feature information and a fourth feature vector of the fourth feature information;

and the processing unit is used for carrying out model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the second model is used for recommending the interested article to the object.

In one aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the above-mentioned model training method.

In one aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor of a computer device, the computer device is caused to perform the above-mentioned model training method.

In one aspect, embodiments of the present application provide a computer program product, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the model training method described above.

In the embodiment of the application, firstly, positive training samples and negative training samples are determined in a training sample set; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model. The model training method fully utilizes the positive training samples and the negative training samples in the training sample set to train the model, and by using the thought of contrast learning, the contribution degree of the difficult negative training samples is adjusted through temperature parameters, so that the learning of the model to the difficult negative training samples is enhanced, the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of the product recommendation are ensured.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic structural diagram of a model training system provided in an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a double tower recall model according to an embodiment of the present disclosure;

FIG. 4 is a graph illustrating the gradient contribution of training samples at different values of τ according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram illustrating another model training method provided in embodiments of the present application;

FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the descriptions of "first", "second", etc. referred to in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defined as "first" or "second" may explicitly or implicitly include at least one such feature.

In the embodiments of the present application, an Artificial Intelligence (AI) technique is involved; AI is a theory, method, technique and application system that uses a digital computer or a digital computer controlled machine to simulate, extend and extend human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Specifically, the AI technology relates to a wide range of fields, both hardware level and software level technologies; on the hardware level, the AI technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, and the like; in the software level, the AI technology mainly includes computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic, and so on. With the research and progress of the AI technology, the AI technology is researched and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service, and the like.

Among them, natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning/deep learning typically includes techniques such as artificial neural networks, self-supervised learning, contrast learning, and so on. The self-supervised learning belongs to one of unsupervised learning paradigms, and is characterized in that manually marked class label information is not needed, data is directly used as supervision information to learn the characteristic expression of sample data, and the feature expression is used for downstream tasks. The contrast learning technology is a method for completing self-supervision learning, and specifically, data is respectively compared with positive samples and negative samples in a feature space to learn feature representation of the samples, and the core of the contrast learning technology is to shorten the distance between the positive samples and to lengthen the distance between the negative samples in the feature space.

Based on the above-mentioned technologies such as machine learning/deep learning, the embodiment of the application provides a model training method to realize training of a recommendation model, improve user experience and recommendation performance, and ensure intelligence and diversity of product recommendation. Specifically, the general principle of the model training method is as follows: firstly, determining a positive training sample and a negative training sample in a training sample set, wherein the positive training sample comprises first characteristic information of a first object and second characteristic information of an article in which the first object is interested, and the negative training sample comprises third characteristic information of a second object and fourth characteristic information of the article in which the second object is interested; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model, thereby completing the training of the model.

In a specific implementation, the above mentioned model training method may be executed by a computer device, which may be a terminal device or a server. The terminal device may be, for example, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, an aircraft, and the like, but is not limited thereto; the server may be, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN) server, and a big data and artificial intelligence platform. The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent traffic, driving assistance and the like.

Alternatively, the above-mentioned model training method may be performed by both the terminal device and the server. For example, see FIG. 1 for an illustration: the terminal device 101 may determine positive training samples and negative training samples in the training sample set, and then send the positive training samples and the negative training samples to the server 102. Correspondingly, the server 102 receives the positive training sample and the negative training sample, and calls the first model to process the feature information contained in the positive training sample and the negative training sample to obtain a first feature vector, a second feature vector, a third feature vector and a fourth feature vector; and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter.

According to the method and the device, aiming at the recommendation scene, the positive training samples and the negative training samples which are concentrated in the training samples can be fully utilized to train the model, the idea of contrast learning is used for reference, the contribution degree of the difficult negative training samples is adjusted through the temperature parameters, the learning of the model to the difficult negative training samples is enhanced, the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of work recommendation are guaranteed.

It is to be understood that the system architecture diagram described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not constitute a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Based on the above explanation, the model training method proposed in the embodiment of the present application is further explained below with reference to the flowchart shown in fig. 2. In the embodiment of the present application, the above-mentioned computer device executing the model training method is mainly taken as an example for explanation. Referring to fig. 2, the model training method may specifically include steps S201 to S203:

s201, determining a positive training sample and a negative training sample in the training sample set.

In an embodiment of the present application, the positive training sample includes first feature information of a first object and second feature information of an item of interest of the first object, and the negative training sample includes third feature information of a second object and fourth feature information of the item of interest of the second object.

Any training sample in the training sample set can comprise characteristic information of an object and characteristic information of an object interested article. The object mentioned herein may refer to a user or other object, and the characteristic information of the object includes but is not limited to: behavioral features, interest features, social features, and the like. Taking the object as a user as an example, the feature information of the user may include user attribute feature information and user sequence feature information; the user attribute feature information comprises user age, gender, city grade, academic calendar and the like, and the user sequence feature information comprises user favorite songs, preference singers, languages and the like.

Further, for each object, the interested article may be an article that the object likes, or an article that the object has a high attention, which is not limited herein. The item referred to herein may refer to a musical composition, a film and television composition, etc., and the characteristic information of the item of interest of the object includes but is not limited to: attribute features, popularity features, and the like. Taking the object as a user and the article as a musical composition as an example, the characteristic information of the musical composition in which the user is interested can include attribute characteristic information of the musical composition and popularity characteristic information of the musical composition; the attribute characteristic information of the musical composition includes album, singer, release year, language, composition identifier, etc., and the popularity characteristic information of the musical composition includes play amount, play completion times, share amount, etc. (i.e., a normalized parameter capable of measuring popularity of the musical composition).

It should be noted that, when the object is a user, the data related to the feature information of the object, the feature information of the object-interested article, and the like in the embodiment of the present application are all acquired after being authorized by the user. Moreover, when the embodiments of the present application are applied to specific products or technologies, the data involved need to be licensed or approved by users, and the collection, use and processing of the relevant data need to comply with relevant laws and regulations and standards in relevant countries and regions.

In a specific implementation, the computer device may determine the positive training samples and the negative training samples in the training sample set by using an in-batch negative sampling (in batch negative sampling). That is to say, the training samples corresponding to the feature information of the articles in which other objects are interested in the same batch (batch) are randomly used as the negative training samples of the current object (i.e., the object corresponding to the positive training sample), so that the hot articles can be more easily sampled to be the negative training samples, the hot articles can be effectively pressed, more middle-length and long-tail articles (i.e., the articles with lower heat) can be effectively exposed, the potential interest of the object is really mined, the ecological health degree of the whole recommendation system is improved, and meanwhile, the engineering problem of maintaining the negative sample queue is also avoided.

Alternatively, the positive training samples and the negative training samples can be determined in any one of the following two ways:

the first method is as follows: the positive training sample is any one of the training samples in the training sample set, and the negative training sample is one or more of the other training samples in the training sample set except the positive training sample.

In the first mode, the computer device selects one training sample from the training sample set as a positive training sample, and then randomly selects one or more training samples from the training sample set except the positive training sample as a negative training sample.

For example, the training sample set includes training sample a, training sample B, and training sample C. In a batch, the computer device may select training sample a from the training sample set as a positive training sample, may select training sample B as a negative training sample, may select training sample C as a negative training sample, or may select both training sample B and training sample C as negative training samples.

The second method comprises the following steps: the positive training sample is any one of the training samples in the training sample set, the negative training sample is a training sample of feature information of a target object included in the training sample set, and the target object is an object whose attention of the first object is less than or equal to a first threshold.

In the second mode, the computer device selects one training sample from the training sample set as a positive training sample, and then randomly selects one or more training samples from the training samples corresponding to the articles with the first object attention degree smaller than or equal to the first threshold value as a negative training sample. The attention degree may be measured according to parameters such as a playing duration, a playing frequency, a praise frequency, and the like, or may be measured according to other manners, which is not limited herein.

For example, the training sample set includes training sample a, training sample B, training sample C, and training sample D. Within a batch, the computer device may select a training sample a from a training sample set as a positive training sample, the training sample a including characteristic information of a first object and characteristic information of an article 1 of interest of the first object. Assuming that the first threshold value is 60, the degree of attention of the first object to the item 2 is 30, the degree of attention of the first object to the item 3 is 80, and the degree of attention of the first object to the item 4 is 50, the items 2 and 4 are target items. The training sample including the characteristic information of the article 2 is a training sample B, and the training sample including the characteristic information of the article 4 is a training sample D, so that the computer device can use the training sample B as a negative training sample, or use the training sample D as a negative training sample, or use both the training sample B and the training sample D as negative training samples.

It should be noted that, the manner of determining the positive training samples and the negative training samples in the training sample set may also adopt other manners, and is not limited herein.

S202, calling a first model to process the first feature information and the second feature information to obtain a first representation vector of the first feature information and a second representation vector of the second feature information, and calling the first model to process third feature information and fourth feature information to obtain a third representation vector of the third feature information and a fourth representation vector of the fourth feature information.

In this embodiment of the application, the first Model may refer to a Deep neural network Model, such as a Deep Semantic Matching Model (DSSM), a Deep Relevance Matching Model (DRMM), and the like, which is not limited herein. In addition, the first model also has strong expandability, and the structure of the first model can be designed according to different scenes.

Taking the first model as a DSSM model, which may also be referred to as a double tower recall model, the DSSM model may be characterized by constructing an object tower and an item tower, and characterizing feature information of the object and feature information of the item into dense vectors. Fig. 3 is a schematic structural diagram of a double tower recall model according to an embodiment of the present application. The computer equipment inputs first feature information of a first object and second feature information of an article interested by the first object, which are included in a training sample, into a first model, and performs operations such as encoding and Embedding (Embedding) through an object feature encoder (object tower) and an article feature encoder (article tower) respectively, and high-dimensional sparse features are mapped to obtain low-dimensional dense feature vectors, so that the first feature information and the second feature information are characterized to be dense vectors in the same space; then, the obtained Embedding features are spliced together and pass through three layers of Deep Neural Networks (DNNs), the dimensionalities are 512, 256 and 128 respectively, and a 128-dimensional output vector is obtained and then is subjected to activation functions (such as tanh) and normalization, so that a first characterization vector of the first feature information and a second characterization vector of the second feature information are output. Similarly, third feature information of a second object included in the negative training sample and fourth feature information of an article of interest of the second object are input into the first model, and a third feature vector of the third feature information and a fourth feature vector of the fourth feature information are output.

It should be noted that, the computer device may set some feature information sharing embed, so that the number of parameters of the model can be effectively reduced. For example, song identifications in the feature information of a musical piece may share Embedding.

S203, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model.

In the embodiment of the application, the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the second model is used for recommending the object interested in the object. The difficult negative training sample refers to a negative training sample with a larger error between prediction and a true value label, and the simple negative training sample refers to a negative training sample with a smaller error between prediction and a true value label. The contribution degree mentioned here refers to the gradient contribution, i.e. the gradient generated by the training sample during training. In model training, when the number of simple training samples reaches a certain degree, the accuracy of the deep learning model mainly depends on the difficult training samples. Therefore, the capability of adjusting the gradient contribution of the difficult negative training sample and the simple negative training sample through the temperature parameter enables the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the learning of the model on the difficult negative training sample is facilitated to be strengthened, so that the recommendation performance of the model is improved.

In a possible implementation manner, when the computer device performs model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector, and the temperature parameter to obtain the second model, a specific implementation manner may be: determining a loss value of the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter; and updating the model parameters of the first model based on the loss value to obtain a second model.

Specifically, the loss value of the first model may be calculated by using formula (1), where formula (1) may be regarded as a loss function of the first model, and the loss function is used to make a cosine distance between a first characterization vector of the first object and a second characterization vector of an article of interest of the first object as small as possible, and make a cosine distance between the first characterization vector of the first object and a fourth characterization vector of an article of interest of the second object as large as possible. Equation (1) is as follows:

f (u) in formula (1) ₁ I) is expressed by equation (2):

f (u) in formula (1) ₂ J) is expressed by equation (3):

wherein the content of the first and second substances,

a token vector representing first feature information of the first object (i.e. a first token vector), z _i A characterization vector representing second characteristic information of the item of interest of the first object (i.e. a second characterization vector),

a token vector representing third feature information of the second object (i.e. a third token vector), z _j A characterization vector representing fourth feature information of the item of interest of the second object (i.e., a fourth characterization vector), i represents the item of interest of the first object, j represents the item of interest of the second object, p _n Representing other objects than the object of interest of the first object,. Tau.representing a temperature parameter,. L _infoNCE Representing impairments of the first modelLosing the value.

In one possible implementation, the value of the temperature parameter is greater than 0 and less than 1. For example, the value of the temperature parameter may be 0.1.

As shown in fig. 4, x represents the similarity of the vector representations of the positive and negative training samples, and g (x) represents the gradient contribution of the training samples; the smaller x represents the simpler the training sample; the larger x, the more difficult it is to represent training samples. For fig. 4 (a), when τ =1, that is, when there is no temperature parameter (equivalent to the loss function (Sampled Softmax) of the current two-tower recall model), the gradient contribution of the difficult negative training sample and the gradient contribution of the simple negative training sample are within a small interval, that is, (0,1.5). That is to say, the difficult negative training sample and the simple negative training sample contribute to the gradient almost, and the model can not distinguish the difficult negative training sample from the simple negative training sample, thereby weakening the learning of the model on the difficult negative training sample. Whereas for fig. 4 (b), when τ =0.1, the gradient contribution of the difficult negative training sample and the gradient contribution of the simple negative training sample are very different, i.e., (0, 4000). That is to say, the gradient contribution of the simple negative training sample is basically negligible compared with the gradient contribution of the difficult negative training sample, and it can be considered that the temperature parameter provides the ability of comparative learning, so that the vector space can be broken up, the difficult negative training sample can be distinguished more easily, and the learning of the model to the difficult negative training sample can be better strengthened.

In summary, in the embodiment of the present application, first, a positive training sample and a negative training sample are determined in a training sample set; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model. The model training method fully utilizes the positive training samples and the negative training samples in the training sample set to train the model, and by using the thought of contrast learning, the contribution degree of the difficult negative training samples is adjusted through temperature parameters, so that the learning of the model to the difficult negative training samples is enhanced, the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of the product recommendation are ensured.

Based on the above explanation, the model training method proposed in the embodiment of the present application is further explained below with reference to the flowchart shown in fig. 5. In the embodiment of the present application, the above-mentioned computer device executing the model training method is mainly taken as an example for explanation. Referring to fig. 5, the model training method may specifically include steps S501 to S507:

s501, determining a positive training sample and a negative training sample in the training sample set.

S502, calling a first model to process the first feature information and the second feature information to obtain a first representation vector of the first feature information and a second representation vector of the second feature information, and calling the first model to process third feature information and fourth feature information to obtain a third representation vector of the third feature information and a fourth representation vector of the fourth feature information.

S503, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model.

The specific implementation manners of steps S501 to S503 may refer to the specific implementation manners of steps S201 to S203, which are not described herein again.

And S504, acquiring fifth characteristic information of the third object.

In this embodiment, the fifth feature information of the third object is feature information of an arbitrary object. The computer device acquires the fifth characteristic information of the third object to recommend the third object with the interested item.

And S505, calling the second model to process the fifth feature information to obtain a fifth feature vector of the fifth feature information.

In the embodiment of the present application, the fifth characterization vector is in the same dimensional space as the characterization vectors of the items in the search library. After the computer device has trained the second model, the second model may be applied online.

In one possible implementation, the method further includes: acquiring characteristic information of a sample article; calling the second model to process the characteristic information of the sample article to obtain a sample characterization vector of the characteristic information of the sample article; the sample characterization vector is stored to a search repository. The feature information of the sample article may refer to the feature information of the article included in the training sample set, or may be the feature information of the article in another database, which is not limited herein. And processing the characteristic information of all sample articles by using the trained second model to obtain sample characterization vectors of the characteristic information of all sample articles, and storing the sample characterization vectors in a search library for searching the characterization vectors matched with the fifth characterization vector.

S506, acquiring a sixth characterization vector matched with the fifth characterization vector from the search library, and adding the sixth characterization vector to the recommendation candidate set.

In this embodiment of the present application, the computer device calculates a matching degree between the fifth token vector and each token vector in the search library, specifically, may calculate a cosine distance between the fifth token vector and each token vector in the search library: the smaller the cosine distance between the fifth token vector and a token vector in the search base, the higher the matching degree between the fifth token vector and a token vector in the search base.

In a possible implementation manner, the sixth token vector is a token vector in the search base whose matching degree with the fifth token vector is within a first preset range.

For example, assume that the search library includes a sample characterization vector a, a sample characterization vector b, and a sample characterization vector c, the matching degree of the fifth characterization vector to the sample characterization vector a is 85, the matching degree of the fifth characterization vector to the sample characterization vector b is 90, the matching degree of the fifth characterization vector to the sample characterization vector c is 70, and the first preset range is (80, 100). Then the token vectors in the search base whose matching degree with the fifth token vector is within the first preset range are the sample token vector a and the sample token vector b, that is, the sample token vector a and the sample token vector b are the sixth token vector, and the sample token vector a and the sample token vector b are added to the recommended candidate set.

And S507, recommending the article corresponding to the target characterization vector determined from the recommendation candidate set to the client corresponding to the third object.

In the embodiment of the present application, the sixth token vector matched with the fifth token vector is stored in the recommended candidate set, and the target token vector may be determined from the recommended candidate set according to a preset rule. The preset rule here may be to randomly select one or more sixth token vectors from the recommendation candidate set as target token vectors; or all sixth characterization vectors in the recommendation candidate set may be divided into a hot area and a cold area according to the degree of heat (i.e., the sixth characterization vectors with the degree of heat greater than or equal to the second threshold are divided into the hot area, and the sixth characterization vectors with the degree of heat less than the second threshold are divided into the hot area), and one or more sixth characterization vectors are respectively selected as the target characterization vectors in the hot area and/or the cold area; other rules may also be used, and are not limited herein.

For example, a sixth token vector m, a sixth token vector n, a sixth token vector q, and a sixth token vector p are included in the recommended candidate set. The heat degree of the article corresponding to the sixth characterization vector m is 30, the heat degree of the article corresponding to the sixth characterization vector n is 40, the heat degree of the article corresponding to the sixth characterization vector q is 70, and the heat degree of the article corresponding to the sixth characterization vector p is 80. Assuming that the second threshold is 50, the sixth token vector m and the sixth token vector n are in a cold area, the sixth token vector q and the sixth token vector p are in a hot area, the sixth token vector m is selected as the target token vector in the cold area, and the sixth token vector q is selected as the target token vector in the hot area. Recommending the article corresponding to the sixth characterization vector m and the article corresponding to the sixth characterization vector q to the client corresponding to the third object.

In a possible implementation manner, when the computer device recommends the item corresponding to the target characterization vector determined from the recommendation candidate set to the client corresponding to the third object, a specific implementation manner may be: sequencing sixth characterization vectors in the recommended candidate set to obtain a candidate sequence; and determining the first N sixth characterization vectors in the candidate sequence as target characterization vectors, and recommending the articles corresponding to the target characterization vectors to the client corresponding to the third object, wherein N is a positive integer. The sorting mode may be sorting according to a heat degree from high to low or sorting according to a low to high, or sorting according to a release time from far to near or sorting according to a near to far, which is not limited herein.

For example, a sixth characterization vector m, a sixth characterization vector n, a sixth characterization vector q, and a sixth characterization vector p are included in the recommendation candidate set. The heat degree of the article corresponding to the sixth characterization vector m is 30, the heat degree of the article corresponding to the sixth characterization vector n is 40, the heat degree of the article corresponding to the sixth characterization vector q is 70, and the heat degree of the article corresponding to the sixth characterization vector p is 80. And according to the ranking from high to low of the heat degree, the obtained candidate sequences are a sixth characterization vector m, a sixth characterization vector n, a sixth characterization vector q and a sixth characterization vector p. And assuming that N is 2, determining a sixth characterization vector m and a sixth characterization vector N as target characterization vectors, and recommending the article corresponding to the sixth characterization vector m and the article corresponding to the sixth characterization vector N to the client corresponding to the third object.

In summary, in the embodiment of the present application, first, a positive training sample and a negative training sample are determined in a training sample set; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model, and applying the second model obtained by training on line. The model training method fully utilizes the positive training samples and the negative training samples in the training sample set to train the model, and by using the thought of contrast learning, the contribution degree of the difficult negative training samples is adjusted through temperature parameters, so that the learning of the model to the difficult negative training samples is enhanced, the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of the product recommendation are ensured.

Based on the model training method, the embodiment of the application provides a model training device. Referring to fig. 6, which is a schematic structural diagram of a model training apparatus according to an embodiment of the present application, the model training apparatus 600 may operate as follows:

a determining unit 601, configured to determine a positive training sample and a negative training sample in a training sample set, where the positive training sample includes first feature information of a first object and second feature information of an item of interest of the first object, and the negative training sample includes third feature information of a second object and fourth feature information of the item of interest of the second object;

a calling unit 602, configured to call a first model to process the first feature information and the second feature information to obtain a first feature vector of the first feature information and a second feature vector of the second feature information, and call the first model to process the third feature information and the fourth feature information to obtain a third feature vector of the third feature information and a fourth feature vector of the fourth feature information;

the processing unit 603 is configured to perform model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector, and a temperature parameter to obtain a second model, where the temperature parameter is used to make a contribution degree of a difficult negative training sample greater than a contribution degree of a simple negative training sample, and the second model is used to recommend an object of interest to the object.

In an embodiment, when performing model optimization on the first model based on the first token vector, the second token vector, the third token vector, the fourth token vector and the temperature parameter to obtain a second model, the processing unit 603 may be specifically configured to: determining a loss value of the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter; and updating the model parameters of the first model based on the loss value to obtain a second model.

In another embodiment, the value of the temperature parameter is greater than 0 and less than 1.

In another embodiment, the positive training sample is any one of the training samples in the training sample set, and the negative training sample is one or more of the training samples in the training sample set other than the positive training sample.

In another embodiment, the negative training sample is a training sample of feature information of a target object included in the training sample set, where the target object is an object whose attention of the first object is smaller than or equal to a first threshold.

In another embodiment, the processing unit 603 is further configured to: acquiring fifth characteristic information of a third object; calling the second model to process the fifth feature information to obtain a fifth characterization vector of the fifth feature information, wherein the fifth characterization vector and the characterization vectors of the articles in the search library are in the same dimensional space; acquiring a sixth characterization vector matched with the fifth characterization vector from the search library, and adding the sixth characterization vector to a recommendation candidate set; recommending the articles corresponding to the target characterization vectors determined from the recommendation candidate set to the client corresponding to the third object.

In another embodiment, the sixth token vector is a token vector in the search base whose matching degree with the fifth token vector is within a first preset range.

In another embodiment, the processing unit 603 is further configured to: acquiring characteristic information of a sample article; calling the second model to process the characteristic information of the sample article to obtain a sample characterization vector of the characteristic information of the sample article; and storing the sample characterization vector to the search library.

In another embodiment, when recommending the item corresponding to the target characterization vector determined from the recommendation candidate set to the client corresponding to the third object, the processing unit 603 may be specifically configured to: sequencing sixth characterization vectors in the recommended candidate set to obtain a candidate sequence; and determining the first N sixth characterization vectors in the candidate sequence as target characterization vectors, and recommending the articles corresponding to the target characterization vectors to the client corresponding to the third object, wherein N is a positive integer.

According to another embodiment of the present application, the model training apparatus as shown in fig. 6 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the respective methods as shown in fig. 2 or fig. 5 on a general-purpose computing device such as a computer including a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and the like, and a storage element, and the model training method of the embodiment of the present application may be implemented. The computer program may be recorded on a computer-readable recording medium, for example, and loaded and executed in the above-described computing apparatus via the computer-readable recording medium.

According to the embodiment of the application, firstly, positive training samples and negative training samples are determined in a training sample set; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model. The model training method fully utilizes the positive training samples and the negative training samples in the training sample set to train the model, and by using the thought of contrast learning, the contribution degree of the difficult negative training samples is adjusted through temperature parameters, and the learning of the model to the difficult negative training samples is enhanced, so that the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of the product recommendation are ensured.

Based on the description of the method embodiment and the apparatus embodiment, the embodiment of the present application further provides a computer device. Referring to fig. 7, the computer device 700 includes at least a processor 701, a communication interface 702, and a computer storage medium 703. The processor 701, the communication interface 702, and the computer storage medium 703 may be connected by a bus or other means. A computer storage medium 703 may be stored in the memory 704 of the computer device 700, the computer storage medium 703 being used for storing a computer program comprising program instructions, the processor 701 being used for executing the program instructions stored by the computer storage medium 703. The processor 701 (or Central Processing Unit, CPU) is a computing core and a control core of a computer device, and is adapted to implement one or more instructions, and in particular, to load and execute the one or more instructions so as to implement a corresponding method flow or a corresponding function.

In an embodiment, the processor 701 according to the embodiment of the present application may be configured to perform a series of processes, which specifically includes: determining a positive training sample and a negative training sample in a training sample set, wherein the positive training sample comprises first characteristic information of a first object and second characteristic information of an article of interest of the first object, and the negative training sample comprises third characteristic information of a second object and fourth characteristic information of the article of interest of the second object; calling a first model to process the first feature information and the second feature information to obtain a first characterization vector of the first feature information and a second characterization vector of the second feature information, and calling the first model to process the third feature information and the fourth feature information to obtain a third characterization vector of the third feature information and a fourth characterization vector of the fourth feature information; and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the second model is used for recommending the object of interest to the object, and the like.

An embodiment of the present application further provides a computer storage medium (Memory), which is a Memory device in a computer device and is used to store programs and data. It is understood that the computer storage medium herein may include both built-in storage media in the computer device and, of course, extended storage media supported by the computer device. Computer storage media provide storage space that stores an operating system for a computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 701. The computer storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer storage medium located remotely from the processor.

In one embodiment, one or more instructions stored in a computer storage medium may be loaded and executed by a processor to perform the corresponding steps of the method described above with respect to the embodiment of the model training method illustrated in FIG. 2 or FIG. 5; in particular implementations, one or more instructions in the computer storage medium are loaded by processor 701 and perform the steps of:

determining a positive training sample and a negative training sample in a training sample set, wherein the positive training sample comprises first characteristic information of a first object and second characteristic information of an article of interest of the first object, and the negative training sample comprises third characteristic information of a second object and fourth characteristic information of the article of interest of the second object;

calling a first model to process the first feature information and the second feature information to obtain a first characterization vector of the first feature information and a second characterization vector of the second feature information, and calling the first model to process the third feature information and the fourth feature information to obtain a third characterization vector of the third feature information and a fourth characterization vector of the fourth feature information;

and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of the difficult negative training sample to be larger than that of the simple negative training sample, and the second model is used for recommending the object of interest to the object.

In one embodiment, when model optimization is performed on the first model based on the first token vector, the second token vector, the third token vector, the fourth token vector, and the temperature parameter to obtain a second model, the one or more instructions may be loaded and executed by the processor to: determining a loss value of the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter; and updating the model parameters of the first model based on the loss value to obtain a second model.

In another embodiment, the negative training sample is a training sample of feature information of a target object included in a training sample set, where the target object is an object whose attention degree of the first object is smaller than or equal to a first threshold value.

In another embodiment, the one or more instructions may be loaded by the processor and further perform: acquiring fifth characteristic information of a third object; calling the second model to process the fifth feature information to obtain a fifth feature vector of the fifth feature information, wherein the fifth feature vector and the feature vector of the article in the search library are in the same dimensional space; acquiring a sixth characterization vector matched with the fifth characterization vector from the search library, and adding the sixth characterization vector to a recommendation candidate set; recommending the articles corresponding to the target characterization vectors determined from the recommendation candidate set to the client corresponding to the third object.

In another embodiment, the one or more instructions may be loaded by the processor and further perform: acquiring characteristic information of a sample article; calling the second model to process the characteristic information of the sample article to obtain a sample characterization vector of the characteristic information of the sample article; storing the sample characterization vector to the search library.

In another embodiment, when recommending the item corresponding to the target token vector determined from the recommendation candidate set to the client corresponding to the third object, the one or more instructions may be loaded by the processor and further perform: sequencing sixth characterization vectors in the recommended candidate set to obtain a candidate sequence; and determining the first N sixth characterization vectors in the candidate sequence as target characterization vectors, and recommending the articles corresponding to the target characterization vectors to the client corresponding to the third object, wherein N is a positive integer.

According to the embodiment of the application, firstly, positive training samples and negative training samples are determined in a training sample set; then calling a first model to process the first characteristic information and the second characteristic information to obtain a first characterization vector of the first characteristic information and a second characterization vector of the second characteristic information, and calling the first model to process the third characteristic information and the fourth characteristic information to obtain a third characterization vector of the third characteristic information and a fourth characterization vector of the fourth characteristic information; and finally, performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and the temperature parameter to obtain a second model. The model training method fully utilizes the positive training samples and the negative training samples in the training sample set to train the model, and by using the thought of contrast learning, the contribution degree of the difficult negative training samples is adjusted through temperature parameters, so that the learning of the model to the difficult negative training samples is enhanced, the recommendation performance and the user experience of the model are improved, and the intelligence and the diversity of the product recommendation are ensured.

It should be noted that according to an aspect of the present application, a computer program product or a computer program is also provided, and the computer program product or the computer program includes computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the methods provided in the various alternatives for the aspects of the embodiment of the model training method illustrated in FIG. 2 or FIG. 5 described above. It should be understood that the above disclosure is only for the preferred embodiment of the present application and should not be taken as limiting the scope of the present application, so that the present application can be covered by the claims of the present application.

Claims

1. A method of model training, the method comprising:

determining a positive training sample and a negative training sample in a training sample set, wherein the positive training sample comprises first characteristic information of a first object and second characteristic information of an item of interest of the first object, and the negative training sample comprises third characteristic information of a second object and fourth characteristic information of the item of interest of the second object;

and performing model optimization on the first model based on the first characterization vector, the second characterization vector, the third characterization vector, the fourth characterization vector and a temperature parameter to obtain a second model, wherein the temperature parameter is used for enabling the contribution degree of a difficult negative training sample to be larger than that of a simple negative training sample, and the second model is used for recommending interested articles to a subject.

2. The method of claim 1, wherein model optimizing the first model based on the first token vector, the second token vector, the third token vector, the fourth token vector, and a temperature parameter to obtain a second model comprises:

determining a loss value for the first model based on the first, second, third, fourth, and temperature parameters;

and updating the model parameters of the first model based on the loss value to obtain a second model.

3. The method according to claim 1, characterized in that the value of the temperature parameter is greater than 0 and less than 1.

4. The method according to any one of claims 1 to 3, wherein the positive training sample is any one of the training samples in the set of training samples, and the negative training sample is one or more of the other training samples in the set of training samples except the positive training sample.

5. The method according to claim 4, wherein the negative training sample is a training sample of feature information of a target item included in the set of training samples, the target item being an item for which the first object attention is less than or equal to a first threshold.

6. The method of claim 1, further comprising:

acquiring fifth characteristic information of a third object;

calling the second model to process the fifth feature information to obtain a fifth characterization vector of the fifth feature information, wherein the fifth characterization vector and the characterization vectors of the articles in the search library are in the same dimensional space;

acquiring a sixth characterization vector matched with the fifth characterization vector from the search library, and adding the sixth characterization vector to a recommendation candidate set;

recommending the articles corresponding to the target characterization vectors determined from the recommendation candidate set to the client corresponding to the third object.

7. The method of claim 6, wherein the sixth token vector is a token vector in the search library that matches the fifth token vector within a first preset range.

8. The method according to claim 6 or 7, characterized in that the method further comprises:

acquiring characteristic information of a sample article;

calling the second model to process the characteristic information of the sample article to obtain a sample characterization vector of the characteristic information of the sample article;

storing the sample characterization vectors to the repository.

9. The method according to claim 6 or 7, wherein recommending the item corresponding to the target characterization vector determined from the recommendation candidate set to the client corresponding to the third object comprises:

sequencing sixth characterization vectors in the recommended candidate set to obtain a candidate sequence;

and determining the first N sixth characterization vectors in the candidate sequence as target characterization vectors, and recommending the articles corresponding to the target characterization vectors to the client corresponding to the third object, wherein N is a positive integer.

10. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the model training method according to any one of claims 1-9.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more computer programs adapted to be loaded by a processor and to perform the model training method according to any one of claims 1 to 9.