CN113449198A

CN113449198A - Training method, device and equipment of feature extraction model and storage medium

Info

Publication number: CN113449198A
Application number: CN202111012812.4A
Authority: CN
Inventors: 郭亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-09-28
Anticipated expiration: 2041-08-31
Also published as: CN113449198B

Abstract

The application discloses a training method, a training device, equipment and a storage medium of a feature extraction model, and belongs to the technical field of computers. The method comprises the following steps: acquiring first user information, candidate recommendation information and bias characteristics of sample user accounts in a sample user account set; performing feature extraction on the first user information to obtain a second user portrait feature vector; extracting the characteristics of the candidate recommendation information to obtain candidate information characteristic vectors; enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature; and training a feature extraction model according to the enhanced matching result. The characteristic extraction model is trained by enhancing the influence of the bias characteristic on the matching result, and the capability of the characteristic extraction model for extracting the bias characteristic can be enhanced, so that the discrimination of the user portrait characteristic vector extracted by the characteristic extraction model in the characteristic dimension corresponding to the bias characteristic is improved, and the accuracy of recommending information, such as advertisements, to the user is improved.

Description

Training method, device and equipment of feature extraction model and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training a feature extraction model.

Background

In the information recommendation process, the server needs to determine a user portrait feature vector of the user, and then determines recommendation information based on the matching degree of the user portrait feature vector and a candidate information feature vector of the candidate recommendation information.

The server typically determines a user portrait feature vector for the user using a two-tower model. The double-tower model comprises a first feature extraction network and a second feature extraction network which are independent of each other. The first feature extraction network is used for extracting a user portrait feature vector according to user information. The second feature extraction network is used for extracting candidate information feature vectors according to the candidate recommendation information. And matching the extracted user portrait feature vector with the candidate information feature vector by the server, and determining error loss based on the matching result and the acquired interaction information of the user and the candidate recommendation information, thereby training the double-tower model. A user representation feature vector of the user may then be extracted using a first feature extraction network of the two-tower model.

The user portrait feature vectors extracted by using the double-tower model are influenced by the bias features, so that the degree of distinction of the user portrait feature vectors of different users output by the double-tower model in the feature dimensions corresponding to the bias features is low, and the accuracy of recommending information to the users is low. And the value of a first characteristic dimension corresponding to the bias characteristic in the user portrait characteristic vector of the user is higher than the value of a second characteristic dimension in the user portrait characteristic vector.

Disclosure of Invention

The application provides a training method, a training device, equipment and a storage medium of a feature extraction model, which can improve the accuracy of recommending information to a user. The technical scheme is as follows.

According to an aspect of the present application, there is provided a training method of a feature extraction model, the method including:

acquiring first user information, candidate recommendation information and bias characteristics of sample user accounts in a sample user account set;

performing feature extraction on the first user information to obtain a second user portrait feature vector; extracting the characteristics of the candidate recommendation information to obtain candidate information characteristic vectors;

enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature;

training the feature extraction model according to the enhanced matching result;

wherein the first value corresponding to the bias feature is higher than a second value, the first value is a value of a first feature dimension corresponding to the bias feature in a first user portrait feature vector of the sample user account, and the second value is a value of a second feature dimension in the first user portrait feature vector.

According to another aspect of the present application, there is provided a training apparatus for a feature extraction model, the apparatus including:

the acquisition module is used for acquiring first user information, candidate recommendation information and bias characteristics of sample user accounts in the sample user account set;

the extraction module is used for performing feature extraction on the first user information to obtain a second user portrait feature vector and performing feature extraction on the candidate recommendation information to obtain a candidate information feature vector;

a processing module for enhancing a matching result of the second user portrait feature vector and the candidate information feature vector using the biased feature;

the training module is used for training the feature extraction model according to the enhanced matching result;

In an alternative design, the feature extraction model includes an embedding layer that includes a first embedding network; the processing module is configured to:

mapping the bias features to a first feature space through the first embedded network to obtain bias feature vectors;

and enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature vector.

In an alternative design, the processing module is configured to:

determining an inner product of the second user portrait feature vector and the candidate information feature vector as the matching result;

adding the matching result and the bias characteristic vector to obtain a fusion vector;

and determining the enhanced matching result according to the fusion vector.

In an alternative design, the biasing feature includes at least one of:

identification of the candidate recommendation information;

and obtaining the candidate recommendation information.

In an alternative design, the feature extraction model includes an embedding layer that includes a second embedding network; the apparatus further comprises an adjustment module configured to:

mapping the first user information to a second feature space through the second embedded network to obtain a user vector, wherein the user vector is used for determining a second user portrait feature vector;

and based on a self-attention mechanism, adjusting the weight of each feature in the user vector according to the time information corresponding to the first user information, wherein the weight is used for reflecting the relative importance degree between the features.

In an alternative design, the adjustment module is configured to:

coding the time information to obtain a time code;

adding the time code and the user vector to obtain a time user vector;

adjusting weights of features in the temporal user vector based on the self-attention mechanism.

In an optional design, the sample user account corresponds to a matching tag, and the matching tag is used for reflecting whether the sample user account is matched with the candidate recommendation information; the training module is configured to:

determining an error loss according to a difference between the enhanced matching result and the matching tag;

and training the feature extraction model according to the error loss.

In an alternative design, the training module is configured to:

determining sparsity loss according to the second user portrait feature vector and a sparsity loss function, wherein the sparsity loss function is used for constraining sparsity of the user portrait feature vector output by the feature extraction model;

and training the feature extraction model according to the error loss and the sparsity loss.

In an alternative design, the training module is configured to:

determining the error loss through a cross entropy loss function according to the enhanced matching result and the matching label;

and training the feature extraction model according to the result of weighted summation of the error loss and the sparsity loss.

In an alternative design, the feature extraction model includes a hidden layer for extracting the second user portrait feature vector and the candidate information feature vector;

the hidden layer is composed of neural networks, each layer of the neural networks comprises a batch regularization layer, and an activation function of each layer of the neural networks is a linear unit function with leakage correction.

In an alternative design, the feature extraction model includes a first feature extraction network and a second feature extraction network; the extraction module is configured to:

performing feature extraction on the first user information through the first feature extraction network to obtain a second user portrait feature vector;

and performing feature extraction on the candidate recommendation information through the second feature extraction network to obtain the candidate information feature vector.

In an alternative design, the obtaining module is configured to:

acquiring second user information of a user account to be extracted;

the extraction module is configured to:

and performing feature extraction on the second user information through the feature extraction model to obtain a user image feature vector of the user account to be extracted.

According to another aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the training method of the feature extraction model as described above.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the training method of the feature extraction model as described above.

According to another aspect of the application, a computer program product or computer program is provided, comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the training method of the feature extraction model provided in the various alternative implementations of the above aspects.

The beneficial effect that technical scheme that this application provided brought includes at least:

in the process of training the feature extraction model, the bias feature is used for enhancing the influence of the bias feature on the matching result of the user portrait feature vector and the candidate information feature vector, and the capability of the feature extraction model for extracting the bias feature can be enhanced, so that the discrimination of the user portrait feature vector extracted by the feature extraction model in the feature dimension corresponding to the bias feature is improved, and the accuracy of recommending information to a user can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a feature extraction model provided by an exemplary embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a method for training a feature extraction model provided in an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a user representation feature vector for different user accounts provided by an exemplary embodiment of the present application;

FIG. 4 is a schematic flow chart diagram of a feature extraction method provided by an exemplary embodiment of the present application;

FIG. 5 is a schematic illustration of information categories provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a self-attention network provided by an exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of a user representation feature vector for different user accounts provided by an exemplary embodiment of the present application;

FIG. 8 is a schematic diagram of a process for training a feature extraction model provided by an exemplary embodiment of the present application;

FIG. 9 is a schematic structural diagram of a training apparatus for feature extraction models provided in an exemplary embodiment of the present application;

FIG. 10 is a schematic structural diagram of a training apparatus for feature extraction models provided in an exemplary embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a feature extraction model provided in an exemplary embodiment of the present application. As shown in fig. 1, the feature extraction model 100 includes an input layer 101, an Embedding (Embedding) layer 102, a hiding layer 103, and an output layer 104, and the feature extraction model 100 includes a first feature extraction network 105 and a second feature extraction network 106.

In the training phase, the computer device may obtain first user information, candidate recommendation information, and bias features for a sample user account in the sample set of user accounts. And the value of a corresponding first characteristic dimension of the bias characteristic in the first user portrait characteristic vector of the sample user account is higher than the value of a second characteristic dimension in the first user portrait characteristic vector. For example, if feature extraction is performed on the first user information by using a Deep Structured Semantic Model (DSSM, also called a two-tower Model), in the obtained user portrait feature vector, the value of the feature dimension corresponding to the offset feature is higher than the values of the other feature dimensions. The first user information includes first user attribute information and first user behavior information.

The computer device maps the biased features to a first vector space through the embedding layer 102, resulting in a biased feature vector. And mapping the first user information to a second vector space to obtain a user vector. And mapping the candidate recommendation information to a third vector space to obtain a candidate information vector.

The computer device extracts a second user portrait feature vector from the user vector through the hidden layer 103. And extracting a candidate information characteristic vector according to the candidate information vector. Illustratively, the hidden layer 103 corresponding to the first feature extraction network 105 is formed by three layers of neural networks, wherein the dimension of the vector output by the first layer of neural network is 512, the dimension of the vector output by the second layer of neural network is 128, and the dimension of the vector output by the third layer of neural network is 32. The hidden layer 103 corresponding to the second feature extraction network 106 is formed by two layers of neural networks, wherein the dimensionality of the vector output by the first layer of neural network is 128, and the dimensionality of the vector output by the second layer of neural network is 32. Moreover, each layer of the neural network in the hidden layer 103 includes a Batch Normalization (BN) layer, and an activation function of each layer of the neural network is an leakage-corrected Linear Unit (leakage ReLU).

The computer device may determine, via the output layer 104, an inner product of the second user representation feature vector and the candidate information feature vector as a result of a match of the second user representation feature vector and the candidate information feature vector. And then adding the inner product and the bias characteristic vector, and determining a matching result of the second user portrait characteristic vector and the candidate information characteristic vector after enhancement according to an S-type Function (Sigmoid Function) based on the addition result. And then determining error loss according to the difference between the enhanced matching result and the matching label corresponding to the sample user account, and determining sparsity loss according to the second user portrait feature vector to train the feature extraction model 100. The matching label is used for reflecting whether the sample user account is matched with the candidate recommendation information. The sparsity loss is used to constrain the sparsity of the user image feature vectors output by the feature extraction model 100. In addition, the embedding layer 102 corresponding to the first feature extraction network 105 further includes a self-attention network, and the self-attention network is configured to adjust the weight of each feature in the user vector according to the time information corresponding to the first user information based on a self-attention mechanism.

In the application stage, the computer device obtains second user information of the user account to be extracted, and performs feature extraction on the second user information through the first feature extraction network 105 in the trained feature extraction model 100, so as to obtain a user image feature vector of the user account to be extracted.

In the process of training the feature extraction model, the bias features are used for enhancing the influence of the bias features on the matching result of the user portrait feature vector and the candidate information feature vector, and the capability of the feature extraction model for extracting the bias features can be enhanced, so that the discrimination of the user portrait feature vector extracted by the feature extraction model in the feature dimension corresponding to the bias features is improved, and the accuracy of recommending information to a user can be improved. In addition, the weight of the features is changed according to the time information by using a self-attention mechanism, so that the problem of information loss caused by direct pooling of the features can be avoided. The feature extraction model is trained by determining the sparsity loss, so that the user image feature vector output by the feature extraction model has sparsity, and the accuracy of recommending information to the user is improved.

Fig. 2 is a flowchart illustrating a training method of a feature extraction model according to an exemplary embodiment of the present application. The method may be used in a computer device. As shown in fig. 2, the method includes the following steps.

Step 202: first user information, candidate recommendation information and bias characteristics of sample user accounts in the sample user account set are obtained.

The sample set of user accounts is formed from sample user accounts in the computer device used to train the feature extraction model. The sample user account can be any user account in the computer device that is determined by the computer device.

The first user information includes first user attribute information and first user behavior information. The first user attribute information comprises age, gender, location and the like corresponding to the sample user account. The first user behavior information is used for reflecting the interest of the sample user account, for example, the first user behavior information is generated based on the interaction behavior between the sample user account and the candidate recommendation information. The interactive behaviors include browsing, clicking, praise, commenting, collecting and the like. The first user behavior information comprises merchants, commodity classifications, prices, identifications and the like corresponding to the information that the sample user account has the interactive behavior. The candidate recommendation information comprises merchants, commodity classifications, prices, identifications and the like corresponding to the candidate recommendation information.

The bias characteristic corresponds to a first value that is higher than a second value. The first value is a value of a first feature dimension corresponding to a bias feature in a first user portrait feature vector of the sample user account, and the second value is a value of a second feature dimension in the first user portrait feature vector. The biased features are features that have a significant impact on the first user representation feature vector corresponding to the sample user account among the features of the sample user account. Optionally, the first user portrait feature vector is obtained by feature extraction of a sample user account through a two-tower model. The first user portrait feature vectors of different sample user accounts have higher values of feature dimensions corresponding to the offset features, but are closer to each other, so that the problem of insufficient discrimination occurs.

Optionally, the biased characteristic includes at least one of an identification of the candidate recommendation information and an acquisition location of the candidate recommendation information. The identifier of the candidate recommendation information can be an Identity identification number (ID) of the candidate recommendation information, and the acquisition position of the candidate recommendation information includes a web page storing the candidate recommendation information and a position of the candidate recommendation information in the web page.

The first user behavior information, the candidate recommendation information, and the bias feature may be overlapped or not overlapped.

Illustratively, FIG. 3 is a schematic diagram of a user portrait feature vector for different user accounts provided by an exemplary embodiment of the present application. As shown in fig. 3, for the user portrait feature vectors of different user accounts extracted using the two-tower model, the degree of distinction in the first dimension 301, the third dimension 303, the fourth dimension 304, the fifth dimension 305, and the sixth dimension 306 is higher than the second dimension 302, but the value of the second dimension 302 is higher than the values of the other feature dimensions. The information for extracting the second dimension feature is the bias feature.

Step 204: and performing feature extraction on the first user information to obtain a second user portrait feature vector, and performing feature extraction on the candidate recommendation information to obtain a candidate information feature vector.

Optionally, the feature extraction model comprises a first feature extraction network and a second feature extraction network. The first feature extraction network and the second feature extraction network are mutually independent feature extraction networks. And the computer equipment performs feature extraction on the first user information through a first feature extraction network to obtain the second user portrait feature vector. The second user profile feature vector can reflect both attribute features and behavioral features of the sample user account. And the computer equipment performs feature extraction on the candidate recommendation information through a second feature extraction network to obtain the candidate information feature vector. The candidate information feature vector can reflect the features of the candidate recommendation information.

Optionally, the first user portrait feature vector is a user portrait feature vector obtained by feature extraction of the first user information through a two-tower model. The second user portrait feature vector is obtained by performing feature extraction on the first user information through the feature extraction model provided by the embodiment of the application.

It should be noted that the second user portrait feature vector is a low-dimensional dense vector corresponding to the first user information. The candidate information feature vector is a low-dimensional dense vector corresponding to the candidate recommendation information.

Step 206: and enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature.

The matching result of the second user portrait feature vector and the candidate information feature vector can reflect the interest degree of the sample user account corresponding to the second user portrait feature vector in the candidate recommendation information corresponding to the candidate information feature vector. For example, the value of the matching result is 0-1, and the higher the value of the matching result is, the higher the matching degree of the sample user account corresponding to the second user portrait feature vector to the candidate recommendation information corresponding to the candidate information feature vector is, and the higher the possibility that the sample user account corresponding to the second user portrait feature vector is interested in the candidate recommendation information corresponding to the candidate information feature vector is.

The matching result of the second user portrait feature vector and the candidate information feature vector is enhanced by using the bias feature, the influence of the bias feature on the matching result of the second user portrait feature vector and the candidate information feature vector is enhanced, and the effect of the bias feature on the matching result can be enhanced. The feature extraction model is trained by using the enhanced matching result, the learning of the feature extraction model on the bias feature can be enhanced, so that the capability of extracting the bias feature can be improved when the feature extraction model performs feature extraction on the user information, the discrimination of the user portrait feature vector extracted by the feature extraction model on the feature dimension corresponding to the bias feature is improved, and the accuracy of recommending information to the user can be improved.

Optionally, the computer device determines an inner product of the second user portrait feature vector and the candidate information feature vector, adds the inner product to a bias feature vector corresponding to the bias feature, and is able to determine an enhanced matching result based on the result of the addition. The inner product of the second user portrait feature vector and the candidate information feature vector is the matching result.

Optionally, the matching result can also refer to a result obtained by processing the inner product according to an S-type function. The determination of the enhanced matching result based on the addition result can refer to a result obtained by processing the addition result according to an S-type function.

Step 208: and training a feature extraction model according to the enhanced matching result.

The sample user account corresponds to a matching label, and the matching label can reflect whether the sample user account is matched with the candidate recommendation information. The matching tag can be determined according to the behavior information of the sample user account, for example, if the interaction behavior between the sample user account and a candidate recommendation information is generated, the candidate recommendation information is considered to be matched with the sample user account.

The computer device can determine an error loss according to a difference between the enhanced matching result and the matching label, and can train the feature extraction model based on back propagation according to the error loss.

In summary, in the method provided by this embodiment, in the process of training the feature extraction model, the bias feature is used to enhance the influence of the bias feature on the matching result of the user portrait feature vector and the candidate information feature vector, and the ability of the feature extraction model to extract the bias feature can be enhanced, so that the degree of distinction of the user portrait feature vector extracted by the feature extraction model in the feature dimension corresponding to the bias feature is improved, and the accuracy of recommending information to the user can be further improved.

Fig. 4 is a flowchart illustrating a feature extraction method according to an exemplary embodiment of the present application. The method may be used in a computer device. As shown in fig. 4, the method includes the following steps.

Step 402: first user information, candidate recommendation information and bias characteristics of sample user accounts in the sample user account set are obtained.

The first user information includes first user attribute information and first user behavior information. The first user behavior information is used for reflecting the interest of the sample user account. The value of a corresponding first feature dimension of the bias feature in a first user portrait feature vector of the sample user account is higher than the value of a second feature dimension in the first user portrait feature vector. The bias features are features that have a significant impact on the first user representation feature vector corresponding to the sample user account among the features of the sample user account. Optionally, the first user portrait feature vector is obtained by feature extraction of a sample user account through a two-tower model.

Optionally, the bias feature comprises at least one of:

identification of candidate recommendation information;

and obtaining the candidate recommendation information.

The identification of the candidate recommendation information can be an ID (e.g., an advertisement ID) of the candidate recommendation information, and the acquisition location of the candidate recommendation information includes a web page storing the candidate recommendation information and a location (e.g., an advertisement slot) of the candidate recommendation information in the web page.

Step 404: and performing feature extraction on the first user information to obtain a second user portrait feature vector, and performing feature extraction on the candidate recommendation information to obtain a candidate information feature vector.

Optionally, the feature extraction model includes a first feature extraction network and a second feature extraction network, and the first feature extraction network and the second feature extraction network are independent of each other. And the computer equipment performs feature extraction on the first user information through a first feature extraction network to obtain a second user portrait feature vector. And the computer equipment performs feature extraction on the candidate recommendation information through a second feature extraction network to obtain a candidate information feature vector.

Based on statistical analysis of data stored in the computer device, the average number of primary categories interacted with each sample user account, the average number of secondary categories interacted with each sample user account, and the average number of tertiary categories interacted with each sample user account are all large. The category interacted by the sample user account is determined according to the category to which the candidate recommendation information interacted by the sample user account belongs. The tertiary category belongs to the secondary category, and the secondary category belongs to the primary category. For example, a primary category includes clothing, a secondary category belonging to clothing includes a jacket, and a tertiary category belonging to a jacket includes a T-shirt.

Illustratively, fig. 5 is a schematic diagram of information categories provided by an exemplary embodiment of the present application. As shown in fig. 5, a first class of users 501 is users with interactive categories smaller than 50, a second class of users 502 is users with interactive categories larger than or equal to 50 and smaller than 100, a third class of users 503 is users with interactive categories larger than or equal to 100 and smaller than 200, a fourth class of users 504 is users with interactive categories larger than or equal to 200 and smaller than 500, a fifth class of users 505 is users with interactive categories larger than or equal to 500 and smaller than 1000, and a sixth class of users 506 is users with interactive categories larger than 1000. In the scene of the primary category, the number of the first-category users 501 is the largest, that is, the number of the interacted users in the primary category is less than 50. In the scene of the secondary category, the number of the fourth type users 504 is the largest, that is, the number of the interacted users in the secondary category is the largest among more than or equal to 200 and less than 500. In the third-class scenario, the number of the fourth-class users 504 is the largest, that is, the number of the interacted users in the third-class scenario is the largest among 200 users and less than 500 users. Through statistical analysis of the data, the number of the primary categories, the number of the secondary categories and the number of the tertiary categories which are interacted with each sample user account in the computer equipment are 106 on average, 320 on average and 555 on average.

Based on the above statistical analysis, if the pooling (posing) operation is directly performed on the input user information, a relatively serious information loss is caused, thereby reducing the accuracy of the user image feature vector output by the feature extraction model. Embodiments of the present application propose to adjust the weights between the individual features based on time and a self-attention mechanism to avoid the above-mentioned problems.

Optionally, the feature extraction model comprises an embedding layer, the embedding layer comprising a second embedding network. The computer device maps the first user information to a second feature space via a second embedded network, enabling a user vector to be obtained, the user vector being used to determine a second user representation feature vector. Optionally, the first user information can include information in multiple dimensions, and the computer device maps the first user information to the second feature space via the second embedded network to obtain multiple vectors. The computer device fuses the plurality of vectors to obtain the user vector. And then, the computer equipment adjusts the weight of each feature in the user vector according to the time information corresponding to the first user information based on the self-attention mechanism. This weight is used to reflect the relative degree of importance between features. The time information is used to reflect the generation time of the first user information.

Illustratively, the first user behavior information is generated based on the behavior of the sample user account for purchasing the book, and the corresponding time information is the time of purchasing the book by the sample user account. Optionally, the weight of the feature closer to the current time is higher, and the weight of the feature farther from the current time is lower.

In the process of adjusting the weight of the feature, the computer device encodes the time information corresponding to the first user information, thereby obtaining the time code. And then adding the time code and the user vector to obtain a time user vector. The computer device then adjusts the weights of the features in the temporal user vector based on a self-attention mechanism. Optionally, based on a self-attention mechanism, the process of adjusting the feature weights is implemented by a self-attention network disposed in a second embedded network of the feature extraction model.

Illustratively, fig. 6 is a schematic diagram of a self-attention network provided by an exemplary embodiment of the present application. As shown in fig. 6, the computer device adds the time code of the time period corresponding to the user vector, for example, adds the time code within one hour to the user vector corresponding to the user information generated by the interactive behavior within one hour (e.g., purchasing books, clothing, fresh, gifts, fruits, home decoration, etc.). And then input from the attention network. The self-attention network adds the user vectors, regularizes them, and adjusts the weights of the features. The dashed arrows in the figure represent skip connection.

Optionally, the feature extraction model comprises a hidden layer for extracting the second user portrait feature vector and the candidate information feature vector. The hidden layer is composed of a neural network, each layer of the neural network comprises a batch regularization layer, and an activation function of each layer of the neural network is a band leakage correction linear unit function. The hidden layer with the structure can accelerate the convergence speed of the feature extraction model.

The structure of the feature extraction model is exemplarily described with reference to fig. 1. The first feature extraction network 105 is divided into an input layer 101, an embedded layer 102, and a hidden layer 103, and the second feature extraction network 106 is divided into the input layer 101, the embedded layer 102, and the hidden layer 103. The input layer 101, the embedding layer 102 and the hiding layer 103 of the first feature extraction network are independent from the second feature extraction network 106. The embedded layer 102 (second embedded network) of the first feature extraction network 105 is used to determine a user vector and the hidden layer 103 of the first feature extraction network 105 is used to extract a second user portrait feature vector from the user vector. The embedding layer 102 (third embedding network) of the second feature extraction network 106 is used to determine candidate information vectors, and the hidden layer 103 of the second feature extraction network 106 is used to extract candidate information feature vectors from the candidate information vectors. The embedding layer 102 of the feature extraction model also includes a first embedding network for determining biased feature vectors from the biased features.

Step 406: and enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature.

Optionally, the feature extraction model comprises an embedding layer, the embedding layer comprising a first embedding network. The first embedded network is independent from the second embedded network and the third embedded network. The computer device maps the biased features to a first feature space through a first embedded network, and can obtain biased feature vectors. And then enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature vector.

Optionally, the computer device may determine an inner product of the second user representation feature vector and the candidate information feature vector as a match. And then adding the matching result and the bias characteristic vector to obtain a fusion vector. And determining the enhanced matching result according to the fusion vector. The matching result can also refer to a result obtained by processing the inner product according to an S-shaped function. Determining the enhanced matching result based on the fusion vector can refer to a result obtained by processing the added result according to an S-type function.

The above-mentioned network for determining the biased eigenvectors and determining the fused vector may also be referred to as a shallow fused network (also referred to as a wide network). The wide network can absorb the influence of the bias characteristics on the user image characteristic vectors output by the characteristic extraction model, and the problem of low discrimination of the user image characteristic vectors output by the characteristic extraction model is solved.

Step 408: and training a feature extraction model according to the enhanced matching result.

Optionally, the sample user account has a matching tag corresponding to the sample user account, where the matching tag is used to reflect whether the sample user account matches the candidate recommendation information. The computer device can determine an error loss based on a difference between the enhanced match result and the matching label. And then, extracting a model according to the error loss training characteristics. Optionally, the computer device can determine the error loss through a cross entropy loss function according to the enhanced matching result and the matching label.

In the practical application process, the problem that the user portrait feature vectors extracted by the feature extraction model are not sparse may exist, that is, the difference of the output user portrait feature vectors of different user accounts in each feature dimension is small, so that the degree of distinction of the user portrait feature vectors of different user accounts is low.

Illustratively, FIG. 7 is a schematic diagram of a user portrait feature vector for different user accounts provided by an exemplary embodiment of the present application. As shown in fig. 7, most of the user portrait feature vectors of different user accounts extracted by the feature extraction model have small differences in values of the first dimension 701, the second dimension 702, the third dimension 703, the fourth dimension 704, the fifth dimension 705, and the sixth dimension 706, and thus the degree of distinction is low. The problem of low accuracy can occur when the user image feature vector with low discrimination is used for recommending information to the user account.

The embodiment of the application provides that when the feature extraction model is trained, sparsity loss is used for restraining sparsity of user image feature vectors output by the feature extraction model, so that the problem is avoided.

And the computer equipment determines the sparsity loss according to the second user portrait feature vector and the sparsity loss function, and then trains a feature extraction model according to the error loss and the sparsity loss. The sparsity loss function is used for constraining the sparsity of the user image feature vector output by the feature extraction model.

Optionally, a sparsity loss function

The expression of (a) is as follows:

；

；

wherein n is the size of the sample size (batch _ size) used in each training of the feature extraction model, d is the length of the user image feature vector output by the feature extraction model,

the feature vector of the user image output by the feature extraction model is normalized to obtain a vector. e is a natural constant.

The computer device can train the feature extraction model according to the result of weighted summation of the error loss and the sparsity loss. Optionally, the result of weighted summation of error loss and sparsity loss

The expression of (a) is as follows:

；

the goal of optimizing the feature extraction model using the result of weighted summation of error loss and sparsity loss is: the sparsity loss is made to approach 0 as much as possible (the sparsity loss approaches 0, the output user image feature vector is sparser), the reduction of cross entropy loss (error loss) is not influenced, and the weight of the sparsity loss is optimized. Alternatively,

equal to one in a thousand.

Illustratively, fig. 8 is a schematic diagram of a process of training a feature extraction model provided by an exemplary embodiment of the present application. As shown in fig. 8, the computer device extracts a second user portrait feature vector 804 from the first user information 801, extracts a candidate information feature vector 805 from the candidate recommendation information 802, and determines a bias feature vector 806 from the bias feature 803 by a feature extraction model. The inner product of second user profile feature vector 804 and candidate information feature vector 805 is then determined, added to biased feature vector 806, and an enhanced match is determined based on an S-property function. Error loss is then determined based on the enhanced matching results and sparsity loss is determined based on second user portrait feature vector 804, and the feature extraction model is co-trained using the error loss and sparsity loss.

Step 410: and acquiring second user information of the user account to be extracted.

The second user information includes second user attribute information and second user behavior information. The account of the user to be extracted is any account needing information recommendation in the computer equipment. Optionally, the computer device is a server, and the server is a server, or a server cluster composed of several servers, or a virtual server in a cloud computing service center, and the like. When the client corresponding to the server sends an information recommendation request to the server, the server acquires the user information and carries out information recommendation on the user account to be extracted based on the user information. For example, when the interface that the client needs to display includes recommendation information, the client sends the information recommendation request to the server. Optionally, the client can be installed in a user terminal, which includes but is not limited to a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, and the like.

Optionally, the computer device that trains the feature extraction model is the same device or a different device than the computer device that extracts the user portrait feature vectors using the trained feature extraction model.

Step 412: and performing feature extraction on the second user information through a feature extraction model to obtain a user image feature vector of the user account to be extracted.

The feature extraction model is trained by the method provided in steps 402 to 408 above. Optionally, the feature extraction model comprises a first feature extraction network and a second feature extraction network. And the computer equipment performs feature extraction on the second user information through a first feature extraction network of the feature extraction model so as to obtain a user image feature vector of the user account to be extracted.

The user portrait feature vector of the user account to be extracted is determined in the computer equipment, the matching degree of the user portrait feature vector of the user account to be extracted and the candidate information feature vector can be calculated, information recommended to the user account to be extracted is determined from the candidate recommendation information according to the matching degree, and information recommendation is achieved. Optionally, the computer device may retrain the feature extraction model periodically based on the most recent data to ensure the accuracy of the model.

The steps 402 to 408 may be implemented separately, and may be a training method of the feature extraction model on the model training side. The

above steps

410 and 412 can be implemented separately, and become a feature extraction method on the model application side.

In summary, in the method provided by this embodiment, in the process of training the feature extraction model, the bias feature is used to enhance the influence of the bias feature on the matching result of the user portrait feature vector and the candidate information feature vector, and the ability of extracting the bias feature in the user portrait feature vector when the feature extraction model performs feature extraction can be enhanced, so that the degree of distinction of the user portrait feature vector extracted by the feature extraction model in the bias feature dimension is improved, and the accuracy of recommending information to the user can be further improved.

The method provided by the embodiment also enhances the matching result through the bias feature vector, and provides a method for enhancing the matching result by using the bias feature.

The method provided by this embodiment further determines a fusion vector based on a sum of an inner product of the second user portrait feature vector and the candidate information feature vector and the bias feature vector, and then determines an enhanced matching result based on the fusion vector, thereby providing a way of determining the enhanced matching result.

In the method provided by this embodiment, at least one of the identifier of the candidate recommendation information and the acquisition position of the candidate recommendation information is regarded as the bias feature, so that two specific bias features are provided.

The method provided by the embodiment also adjusts the weight of each feature through an attention mechanism, and avoids information loss caused by direct pooling.

The method provided by the embodiment also provides a way to adjust the weight of each feature by processing the time sequence feature using the self-attention mechanism through the way of adding the time code and the user vector.

The method provided by the embodiment also provides a way for training the feature extraction model by determining the error loss to train the feature extraction model.

In the method provided by the embodiment, the feature extraction model is trained through sparsity loss and error loss together, so that sparsity of the user image feature vector output by the feature extraction model can be ensured.

The method provided by this embodiment further trains the feature extraction model according to the result of weighted summation of the error loss and the sparsity loss, and provides a way of training the feature extraction model according to the error loss and the sparsity loss together.

The method provided by the embodiment further realizes the acceleration of the convergence speed of the feature extraction model in the training process by the batch regularization layer and the neural network with the leakage correction linear unit function.

The method provided by the embodiment further extracts the second user portrait feature vector and the candidate information feature vector through the first feature extraction network and the second feature extraction network, and provides a way for extracting the second user portrait feature vector and the candidate information feature vector.

In the method provided by the embodiment, the user portrait feature vector of the user account to be extracted is extracted according to the second user information of the user account to be extracted, so that the user portrait feature vector of the user account to be extracted can be used for recommending information for the user account to be extracted.

It should be noted that, the order of the steps of the method provided in the embodiments of the present application may be appropriately adjusted, and the steps may also be increased or decreased according to the circumstances, and any method that can be easily conceived by those skilled in the art within the technical scope disclosed in the present application shall be covered by the protection scope of the present application, and therefore, the detailed description thereof is omitted.

Taking the application of the method provided by the embodiment of the application to the field of advertisements as an example, in the model training stage, the server acquires the first user attribute information, the first user behavior information, the candidate advertisements, the advertisement IDs and the advertisement positions of the sample user accounts in the sample user account set, and trains the feature extraction model through the acquired data.

In the advertisement recommendation stage, when the client needs to display a user interface with an advertisement recommendation function or start running, an advertisement recommendation request is sent to the server. Alternatively, the client can be installed in a vehicle-mounted terminal, and the vehicle-mounted terminal can recommend an advertisement to a user through the client. The server determines the account of the user to be recommended according to the user account identification in the advertisement recommendation request, acquires second user attribute information and second user behavior information of the account of the user to be recommended, and extracts the user portrait feature vector of the account of the user to be recommended through a trained feature extraction model. And then matching the user portrait feature vector with advertisement features of candidate advertisements stored in the server, so as to determine recommended advertisements and send the recommended advertisements to the client, and recommending the advertisements which are interested in the user portrait feature vector to the account of the user to be recommended. Optionally, the client displays the recommended advertisement to the user who logs in the account of the user to be recommended through the vehicle-mounted terminal (for example, the recommended advertisement is displayed in a parking state of the vehicle). For example, the advertisements of the interested commodities are recommended to the user and displayed, and the user can directly purchase the advertisements through the client in the vehicle-mounted terminal, so that the pleasure of the user in the vehicle using process can be enriched, and the user experience is improved. The advertisement characteristics of the candidate advertisements are obtained by performing characteristic extraction on the candidate advertisements through a trained characteristic extraction model.

Fig. 9 is a schematic structural diagram of a training apparatus for feature extraction models according to an exemplary embodiment of the present application. The apparatus may be for a computer device. As shown in fig. 9, the apparatus 90 includes the following modules.

An obtaining module 901, configured to obtain first user information, candidate recommendation information, and bias characteristics of a sample user account in a sample user account set.

An extracting module 902, configured to perform feature extraction on the first user information to obtain a second user portrait feature vector, and perform feature extraction on the candidate recommendation information to obtain a candidate information feature vector.

And a processing module 903, configured to enhance a matching result of the second user portrait feature vector and the candidate information feature vector by using the biased feature.

And a training module 904, configured to train the feature extraction model according to the enhanced matching result.

The first value corresponding to the bias feature is higher than the second value, the first value is a value of a first feature dimension corresponding to the bias feature in a first user portrait feature vector of the sample user account, and the second value is a value of a second feature dimension in the first user portrait feature vector.

In an alternative design, the feature extraction model includes an embedding layer that includes a first embedding network. A processing module 903 configured to:

and mapping the bias features to a first feature space through a first embedded network to obtain bias feature vectors. And enhancing the matching result of the second user portrait characteristic vector and the candidate information characteristic vector by using the bias characteristic vector.

In an alternative design, the processing module 903 is configured to:

and determining the inner product of the second user portrait feature vector and the candidate information feature vector as a matching result. And adding the matching result and the bias characteristic vector to obtain a fusion vector. And determining the enhanced matching result according to the fusion vector.

In an alternative design, the biasing feature includes at least one of:

identification of candidate recommendation information;

and obtaining the candidate recommendation information.

In an alternative design, the feature extraction model includes an embedding layer that includes the second embedding network. As shown in fig. 10, the apparatus 90 further comprises an adjusting module 905, the adjusting module 905 configured to:

and mapping the first user information to a second feature space through a second embedded network to obtain a user vector, wherein the user vector is used for determining a second user portrait feature vector. And based on the self-attention mechanism, adjusting the weight of each feature in the user vector according to the time information corresponding to the first user information, wherein the weight is used for reflecting the relative importance degree between the features.

In an alternative design, the adjustment module 905 is configured to:

and coding the time information to obtain a time code. And adding the time code and the user vector to obtain a time user vector. The weights of the features in the temporal user vector are adjusted based on a self-attention mechanism.

In an optional design, the sample user account corresponds to a matching tag, and the matching tag is used for reflecting whether the sample user account is matched with the candidate recommendation information. A training module 904 for:

error loss is determined from the difference between the enhanced match result and the matching label. And (5) extracting a model according to the error loss training characteristics.

In an alternative design, training module 904 is configured to:

and determining sparsity loss according to the second user portrait feature vector and a sparsity loss function, wherein the sparsity loss function is used for restricting the sparsity of the user portrait feature vector output by the feature extraction model. And training a feature extraction model according to the error loss and the sparsity loss.

In an alternative design, training module 904 is configured to:

and determining error loss through a cross entropy loss function according to the enhanced matching result and the matching label. And training a feature extraction model according to the result of weighted summation of the error loss and the sparsity loss.

In an alternative design, the feature extraction model includes a hidden layer for extracting the second user portrait feature vector and the candidate information feature vector. The hidden layer is composed of a neural network, each layer of the neural network comprises a batch regularization layer, and an activation function of each layer of the neural network is a band leakage correction linear unit function.

In an alternative design, the feature extraction model includes a first feature extraction network and a second feature extraction network. An extracting module 902 configured to:

and performing feature extraction on the first user information through a first feature extraction network to obtain a second user portrait feature vector. And performing feature extraction on the candidate recommendation information through a second feature extraction network to obtain a candidate information feature vector.

In an alternative design, the obtaining module 901 is configured to:

and acquiring second user information of the user account to be extracted.

An extracting module 902 configured to:

and performing feature extraction on the second user information through a feature extraction model to obtain a user image feature vector of the user account to be extracted.

It should be noted that: the training device for feature extraction models provided in the above embodiments is only exemplified by the division of the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the training device of the feature extraction model provided in the above embodiments and the training method embodiment of the feature extraction model belong to the same concept, and the specific implementation process thereof is described in detail in the method embodiments and will not be described herein again.

Embodiments of the present application further provide a computer device, including: the system comprises a processor and a memory, wherein at least one instruction, at least one program, a code set or an instruction set is stored in the memory, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor to realize the training method of the feature extraction model provided by the method embodiments.

Optionally, the computer device is a server. Illustratively, fig. 11 is a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.

The computer device 1100 includes a Central Processing Unit (CPU) 1101, a system Memory 1104 including a Random Access Memory (RAM) 1102 and a Read-Only Memory (ROM) 1103, and a system bus 1105 connecting the system Memory 1104 and the CPU 1101. The computer device 1100 also includes a basic Input/Output system (I/O system) 1106, which facilitates transfer of information between devices within the computer device, and a mass storage device 1107 for storing an operating system 1113, application programs 1114, and other program modules 1115.

The basic input/output system 1106 includes a display 1108 for displaying information and an input device 1109 such as a mouse, keyboard, etc. for user input of information. Wherein the display 1108 and input device 1109 are connected to the central processing unit 1101 through an input output controller 1110 connected to the system bus 1105. The basic input/output system 1106 may also include an input/output controller 1110 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1110 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1107 is connected to the central processing unit 1101 through a mass storage controller (not shown) that is connected to the system bus 1105. The mass storage device 1107 and its associated computer-readable storage media provide non-volatile storage for the computer device 1100. That is, the mass storage device 1107 may include a computer-readable storage medium (not shown) such as a hard disk or Compact disk-Only Memory (CD-ROM) drive.

Without loss of generality, the computer-readable storage media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable storage instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory devices, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1104 and mass storage device 1107 described above may be collectively referred to as memory.

The memory stores one or more programs configured to be executed by the one or more central processing units 1101, the one or more programs containing instructions for implementing the method embodiments described above, the central processing unit 1101 executing the one or more programs implementing the methods provided by the various method embodiments described above.

The computer device 1100 may also operate as a remote computer device connected to a network via a network, such as the internet, in accordance with various embodiments of the present application. That is, the computer device 1100 may connect to the network 1112 through the network interface unit 1111 that is coupled to the system bus 1105, or may connect to other types of networks or remote computer device systems (not shown) using the network interface unit 1111.

The memory also includes one or more programs, stored in the memory, that include instructions for performing the steps performed by the computer device in the methods provided by the embodiments of the present application.

The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the computer-readable storage medium, and when the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor of a computer device, the method for training a feature extraction model provided in the above method embodiments is implemented.

The present application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to enable the computer device to execute the training method of the feature extraction model provided by the method embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the above readable storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an example of the present application and should not be taken as limiting, and any modifications, equivalent switches, improvements, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for training a feature extraction model, the method comprising:

2. The method of claim 1, wherein the feature extraction model comprises an embedding layer, the embedding layer comprising a first embedding network;

the enhancing the matching result of the second user portrait feature vector and the candidate information feature vector by using the bias feature comprises:

3. The method of claim 2, wherein the enhancing the matching of the second user representation feature vector with the candidate information feature vector using the biased feature vector comprises:

and determining the enhanced matching result according to the fusion vector.

4. The method of any of claims 1 to 3, wherein the bias characteristic comprises at least one of:

identification of the candidate recommendation information;

and obtaining the candidate recommendation information.

5. The method of any of claims 1 to 3, wherein the feature extraction model comprises an embedding layer, the embedding layer comprising a second embedding network;

the method further comprises the following steps:

6. The method of claim 5, wherein the adjusting the weight of each feature in the user vector according to the time information corresponding to the first user information based on the self-attention mechanism comprises:

coding the time information to obtain a time code;

adding the time code and the user vector to obtain a time user vector;

7. The method according to any one of claims 1 to 3, wherein the sample user account corresponds to a matching tag, and the matching tag is used for reflecting whether the sample user account is matched with the candidate recommendation information;

the training of the feature extraction model according to the enhanced matching result comprises:

and training the feature extraction model according to the error loss.

8. The method of claim 7, wherein training the feature extraction model based on the error loss comprises:

9. The method of claim 8, wherein determining an error loss based on a difference between the enhanced match result and the matching label comprises:

the training the feature extraction model according to the error loss and the sparsity loss includes:

10. The method of any of claims 1 to 3, wherein the feature extraction model comprises a hidden layer for extracting the second user representation feature vector and the candidate information feature vector;

11. The method of any one of claims 1 to 3, wherein the feature extraction model comprises a first feature extraction network and a second feature extraction network;

the feature extraction of the first user information is performed to obtain a second user portrait feature vector, and the method comprises the following steps:

the extracting the features of the candidate recommendation information to obtain a candidate information feature vector includes:

12. The method of any of claims 1 to 3, further comprising:

acquiring second user information of a user account to be extracted;

13. An apparatus for training a feature extraction model, the apparatus comprising:

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement a training method of a feature extraction model according to any one of claims 1 to 12.

15. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of training a feature extraction model according to any one of claims 1 to 12.