CN111414539A

CN111414539A - Recommendation system neural network training method and device based on feature enhancement

Info

Publication number: CN111414539A
Application number: CN202010197501.9A
Authority: CN
Inventors: 施韶韵; 张敏; 郝斌; 李大任; 张瑞; 于新星; 单厚智; 刘奕群; 马少平
Original assignee: Tsinghua University; Zhizhe Sihai Beijing Technology Co Ltd
Current assignee: Tsinghua University; Zhizhe Sihai Beijing Technology Co Ltd
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2020-07-14
Anticipated expiration: 2040-03-19
Also published as: CN111414539B

Abstract

The disclosure relates to a recommendation system neural network training method and device based on feature enhancement, wherein the method comprises the following steps: inputting a plurality of first samples in the first training set into a neural network to be trained in the t-th round for processing to obtain prediction scores corresponding to the plurality of first samples; according to the feature information of the first samples and the prediction scores corresponding to the first samples, the attention of the neural network to each attribute is respectively determined; respectively determining the enhancement probability of each attribute according to the attention threshold and the attention of the neural network to each attribute; determining feature information to be updated from the feature information of the plurality of first samples according to the first enhancement rate and the enhancement probability; updating a first sample in the first training set according to the feature information to be updated and the noise feature value to obtain an updated second training set; and performing the t-th round of training on the neural network according to the second training set. Embodiments of the present disclosure may improve the robustness of neural networks.

Description

Recommendation system neural network training method and device based on feature enhancement

Technical Field

The disclosure relates to the field of machine learning, in particular to a recommendation system neural network training method and device based on feature enhancement.

Background

Deep learning is one of machine learning, and mainly utilizes a deep neural network to analyze and model data so as to discover rules between input features and predicted targets. Deep learning has achieved significant effects in many fields such as computer vision, computational linguistics, information retrieval, and the like.

The design of the deep neural network usually focuses on network architecture, feature representation and the like, and the deep neural network is easy to over-fit in the training process, so that some features are over-depended and other features are ignored. For example, when the obesity degree of a person is predicted by using a deep neural network, if characteristics such as sex, age, weight, height, circumference, shoulder width and the like are taken as input, and after a long-time training without constraint, the deep neural network is likely to be over-fitted, only two characteristics such as sex and weight are concerned, and other relatively indirect characteristics are not sufficiently utilized. Furthermore, during the use of the deep neural network, some features may be noisy, for example, the weight may be inaccurate, and the deep neural network may rely too much on these noisy features to make the prediction result less accurate.

Disclosure of Invention

In view of this, the present disclosure provides a method and an apparatus for training a neural network of a recommendation system based on feature enhancement.

According to an aspect of the present disclosure, there is provided a recommendation system neural network training method based on feature enhancement, the method including:

inputting a plurality of first samples in a preset first training set into a t-th round of neural network to be trained for processing to obtain prediction scores corresponding to the plurality of first samples, wherein t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of objects to be recommended;

according to the feature information of the first samples and the prediction scores corresponding to the first samples, the attention degree of the neural network to each attribute is respectively determined;

respectively determining the enhancement probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute;

determining feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

updating a first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

performing a t-th round of training on the neural network according to the second training set,

the neural network is applied to a recommendation system and used for predicting the scoring of a user on an object to be recommended in the recommendation system.

In a possible implementation manner, determining the attention of the neural network to each attribute according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples respectively includes:

for any first sample in a first training set, respectively determining a first contribution value of each feature information of the first sample to a prediction score according to the feature information of the first sample and the prediction score corresponding to the first sample;

for any attribute in the plurality of attributes, determining a second contribution value of the feature information corresponding to the attribute from the first contribution values of the feature information of the first samples;

and determining the average value of the second contribution values as the attention degree of the neural network to the attribute.

In a possible implementation manner, determining feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute, includes:

determining the enhancement quantity of the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples;

randomly selecting a plurality of second samples from a plurality of first samples of the first training set, wherein the number of the second samples is the same as the enhancement number;

and for any second sample, randomly selecting one attribute from a plurality of attributes according to the enhanced probability of each attribute, and determining the feature information corresponding to the randomly selected attribute in the second sample as the feature information to be updated.

In one possible implementation, the method further includes:

determining a second enhancement rate of the characteristic information of the plurality of first samples during the t-th round of training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset change value of each round of enhancement rate;

and determining a first enhancement rate of the feature information of the plurality of first samples during the t-th round of training according to the maximum enhancement rate and the second enhancement rate.

In a possible implementation manner, determining, according to a preset attention threshold and the attention of the neural network to each attribute, an enhanced probability of each attribute respectively includes:

for any attribute, determining the attention degree of the neural network as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is smaller than a preset attention degree threshold value.

In a possible implementation manner, determining, according to a preset attention threshold and the attention of the neural network to each attribute, an enhanced probability of each attribute respectively further includes:

for any attribute, determining the product of the attention degree and a preset adjustment proportion as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is greater than or equal to a preset attention degree threshold value.

In one possible implementation manner, the neural network includes an input layer, an N-level intermediate layer and an output layer, the input layer inputs the feature information of each first sample, the output layer outputs the prediction score corresponding to each first sample, the N-level intermediate layer outputs N-level intermediate feature information in the processing process respectively, N is a positive integer,

determining a first contribution value of each feature information of the first sample to the prediction score according to the feature information of the first sample and the prediction score corresponding to the first sample, respectively, including:

according to the prediction scores corresponding to the first samples, determining the contribution values of the N-th-level intermediate characteristic information to the prediction scores respectively;

determining the contribution value of each N-1 level intermediate characteristic information to the prediction score according to the contribution value of each N-level intermediate characteristic information to the prediction score, the N-level intermediate characteristic information and the N-1 level intermediate characteristic information;

determining the contribution value of each i-1 level intermediate characteristic information to the prediction score according to the contribution value of each i-level intermediate characteristic information to the prediction score, the i-level intermediate characteristic information and the i-1 level intermediate characteristic information, wherein i is an integer and is more than or equal to 2 and less than or equal to N;

and respectively determining a first contribution value of each characteristic information of the first sample to the prediction score according to the contribution value of each level 1 intermediate characteristic information to the prediction score, the level 1 intermediate characteristic information and the characteristic information of the first sample.

In one possible implementation, the feature information of the plurality of first samples in the first training set is represented by a feature matrix, each row of the feature matrix represents one first sample, and each column of the feature matrix represents one attribute.

According to another aspect of the present disclosure, there is provided a recommendation system neural network training device based on feature enhancement, the device including:

the system comprises a prediction score determining module, a recommendation score determining module and a recommendation score determining module, wherein the prediction score determining module is used for inputting a plurality of first samples in a preset first training set into a t-th to-be-trained neural network for processing to obtain prediction scores corresponding to the first samples, t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of objects to be recommended;

the attention degree determining module is used for respectively determining the attention degrees of the neural network to each attribute according to the feature information of the first samples and the prediction scores corresponding to the first samples;

the enhancement probability determination module is used for respectively determining the enhancement probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute;

the to-be-updated feature determination module is used for determining feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

the training set updating module is used for updating a first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

a training module for performing the t-th round of training on the neural network according to the second training set,

In one possible implementation, the apparatus further includes:

the first enhancement rate determining module is used for determining second enhancement rates of the characteristic information of the plurality of first samples during the t-th round of training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset change value of each round of enhancement rate;

and the second enhancement rate determining module is used for determining a first enhancement rate of the feature information of the plurality of first samples during the tth round of training according to the maximum enhancement rate and the second enhancement rate.

According to the embodiment of the disclosure, when the method is applied to the neural network training of the recommendation system, the enhancement probability of each attribute can be determined according to the attention degree of the neural network to be trained in the current training turn to each attribute, further determining a training set which is used in the current training round and is subjected to characteristic enhancement according to the enhancement probability of each attribute and a preset first enhancement rate, training the neural network by using the training set, thereby optimizing the attention degree of the neural network to different attributes in the training process of the neural network, so that the neural network can comprehensively utilize all the characteristic information during prediction, avoid overfitting or over-dependence on partial characteristic information, meanwhile, when part of characteristic information of the neural network is noisy, other characteristic information is fully utilized for prediction, the robustness of the neural network is improved, and meanwhile, the accuracy of neural network prediction is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a method for feature enhancement based recommendation system neural network training, in accordance with an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of an application scenario of a feature enhancement based recommendation system neural network training method according to an embodiment of the present disclosure.

FIG. 3 shows a block diagram of a feature enhancement based recommendation system neural network training apparatus, according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The method for training a Neural Network of a recommendation system based on feature enhancement according to the embodiments of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a CPU (Central Processing Unit), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations, such as a GPU (Graphics Processing Unit), an NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and the like. The present disclosure is not limited to a particular type of processor.

The feature enhancement according to the embodiment of the present disclosure may be to randomly conceal part of feature information in the initial feature information by setting an invalid value, a noise value, and the like. That is, more noisy/invalid features are included in the feature enhanced feature information than the initial feature information. Accordingly, the score corresponding to the enhanced sample is more difficult to predict than the initial sample. Training the neural network according to the training set with enhanced features can improve the robustness of the neural network.

In a possible implementation manner, the neural network of the recommendation system may be a neural network applied to a recommendation system, and is used for predicting the rating of the user on the object to be recommended in the recommendation system. The recommendation system can comprise various recommendation systems, such as a movie and television work recommendation system, a commodity recommendation system, a literature recommendation system, a shared knowledge recommendation system in a knowledge sharing platform, and the like. The objects to be recommended in the recommendation system can also comprise various objects, such as movie works, commodities, literature works, shared knowledge, multimedia data, documents and the like. The present disclosure does not limit the specific application scenario of the recommendation system and the specific content of the object to be recommended.

Fig. 1 shows a flow diagram of a method for feature enhancement based recommendation system neural network training, in accordance with an embodiment of the present disclosure. As shown in fig. 1, the method includes:

step S11, inputting a plurality of first samples in a preset first training set into a t-th round of neural network to be trained for processing, so as to obtain prediction scores corresponding to the plurality of first samples, wherein t is a positive integer, and the first samples comprise characteristic information representing user attributes and characteristic information representing object attributes of objects to be recommended;

step S12, respectively determining attention degrees of the neural network to each attribute according to the feature information of the first samples and the prediction scores corresponding to the first samples;

step S13, determining the enhancement probability of each attribute according to the preset attention threshold and the attention of the neural network to each attribute;

step S14, determining feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute;

step S15, updating the first sample in the first training set according to the feature information to be updated and a preset noise feature value to obtain an updated second training set;

and step S16, performing the t-th round of training on the neural network according to the second training set.

In one possible implementation, the first training set may be determined prior to training the neural network. The first training set may include a plurality of first samples and reference scores corresponding to the plurality of first samples. Wherein, each first sample can comprise characteristic information representing the user attribute and characteristic information representing the object attribute of the object to be recommended.

In one possible implementation, the user attributes may include user identification, age, gender, occupation, city, and the like. The objects to be recommended are different, and the object attributes may be different. For example, when the object to be recommended is a movie work, the object attributes may include movie identification, name, director, main actors, year of showing, region of propagation, multiple movie categories (e.g., science fiction, love, war), and the like; when the object to be recommended is a commodity, the object attributes of the object can comprise a commodity identification, a name, a production date, a manufacturer, a commodity price and the like; when the object to be recommended is a literary work, the object attribute can comprise an identification, a name, an author, a keyword and the like of the literary work; when the object to be recommended is shared knowledge, the object attributes of the object to be recommended can comprise identification, name, keywords, access amount and the like of the shared knowledge; when the object to be recommended is multimedia data, the object attribute can include the identification, name, format, keyword, size, etc. of the multimedia data; when the object to be recommended is a document, the object attribute may include an identifier, a name, a format, a keyword, a size, and the like of the document. The user identifier (i.e. user ID, Identity) may be used to uniquely identify the user, and the identifier of the object to be recommended may be used to uniquely identify the object to be recommended.

In one possible implementation manner, the object attribute of the object to be recommended may be determined according to a specific application scenario of the neural network. The application scenes are different, the objects to be recommended may be different, and the object attributes may also be different. It should be understood that, a person skilled in the art may set specific contents of the user attribute and the object attribute of the object to be recommended according to a specific application scenario of the neural network, and the present disclosure does not limit this.

In a possible implementation manner, after the first training set is determined, in step S11, a plurality of preset first samples in the first training set are input into the t-th to-be-trained neural network for processing, scores of the to-be-recommended objects by the user in each first sample are respectively predicted, and prediction scores corresponding to the plurality of first samples are obtained, where t is a positive integer. Wherein, the t-th round represents the current training round of the neural network.

In one possible implementation manner, in step S12, the attention of the neural network to each attribute may be determined according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples. Wherein, the attention degree of the neural network to each attribute can be used for representing the relevance of the prediction score output by the neural network to each attribute. The sum of the attention of the neural network to all the attributes is equal to 1, or the difference between the sum and 1 is within an error range.

In a possible implementation manner, the attention degree of the neural network to each attribute may be determined respectively through inter-layer correlation Propagation (L eye-wise independence Propagation) according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples.

In a possible implementation manner, after determining the attention degree of the neural network to each attribute, in step S13, the enhanced probability of each attribute may be determined according to a preset attention degree threshold and the attention degree of the neural network to each attribute. Wherein, the value range of the attention threshold is more than 0 and less than 1.

In one possible implementation, the attention threshold may be different for different training rounds, that is, the attention threshold may be changed according to the change of the current training round t. For example, the attention threshold may increase with increasing t. The attention threshold value in the tth round of training can be preset by a person skilled in the art according to actual conditions, and the disclosure does not limit this.

In one possible implementation, the higher the interest of a certain attribute by the neural network, the more important the attribute is. When determining the enhancement probability of each attribute, classifying the attention of each attribute according to a preset attention threshold, then adjusting the attention of each classified attribute according to the training requirement, and determining the adjusted attention of each attribute as the enhancement probability of each attribute in the t-th round of training.

In a possible implementation manner, in step S14, the feature information to be updated may be determined from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and an enhancement probability of each attribute. Wherein, the value range of the first enhancement rate is more than 0 and less than 1.

In a possible implementation manner, the preset first enhancement rate of the feature information of the plurality of first samples may represent an enhancement ratio of the feature information of the plurality of first samples. The first enhancement rate may be different for different training rounds, i.e. the first enhancement rate may vary depending on the current training round t, e.g. the first enhancement rate may increase with increasing t. The person skilled in the art can preset the first enhancement rate in the tth round of training according to practical situations, which is not limited by the present disclosure.

In a possible implementation manner, the enhancement quantity of the feature information of the plurality of first samples may be determined according to a preset first enhancement rate of the feature information of the plurality of first samples, and then the feature information to be updated is randomly determined from the feature information of the plurality of first samples according to the enhancement quantity and the enhancement probability of each attribute.

In a possible implementation manner, after determining the feature information to be updated, in step S15, the first sample in the first training set is updated according to the feature information to be updated and the preset noise feature value, so as to obtain an updated second training set.

In one possible implementation, the preset noise characteristic value may be represented as a specific number. For example, the noise characteristic value U may be represented by a number 0. The specific value of the noise characteristic value can be preset by a person skilled in the art according to actual conditions, and the disclosure does not limit this.

In a possible implementation manner, when the first sample in the first training set is updated, the preset noise feature value may be used to replace the feature information to be updated, so as to obtain an updated second training set. The second training set is a feature enhanced training set compared to the initial first training set.

In one possible implementation, after obtaining the second training set, in step S16, the neural network may be trained according to the second training set in a t-th round. The plurality of samples in the second training set can be input into the neural network for processing to obtain the prediction values corresponding to the plurality of samples in the second training set, and the parameters of the neural network are adjusted according to the error between the prediction values and the corresponding reference values to obtain the neural network finished in the t-th round of training.

In one possible implementation, when training the neural network, the first enhancement rate and the attention threshold may be adjusted as the training turns increase, for example, the first enhancement rate and the attention threshold may gradually increase as the training turns increase until a preset maximum value is reached. Through the mode, the quantity of the characteristic information to be updated can be gradually increased to the maximum value, meanwhile, the importance degree of the attribute corresponding to the characteristic information to be updated can be gradually increased, so that the strength of characteristic enhancement of the second training set can be gradually improved along with the increase of training rounds, and the stability of the neural network training process is further improved.

In one possible implementation, the second training set of each training round is updated from the initial first training set, and the updates are not accumulated on the second training set of the previous round. By the method, the second training sets of the training rounds are independent from each other, and the diversity of training samples is increased.

In a possible implementation manner, when the neural network meets a preset training end condition, the training is ended to obtain a trained neural network. The preset training end condition may be set according to an actual situation, for example, the training end condition may be that the effect of the neural network on the verification set decreases in a continuous preset turn (for example, continuous 5 turns); the training end condition can also be that the loss function of the neural network is reduced to a certain degree or converged within a certain threshold value; the training end condition may also be other conditions. The present disclosure does not limit the specific contents of the training end condition.

In one possible implementation, the trained neural network may be applied to a recommendation system for predicting a user's score for an object to be recommended in the recommendation system. According to the specific application scene of the recommendation system, determining the user attribute and the object attribute of the object to be recommended, and determining input data corresponding to the user attribute and the object attribute; inputting the input data into a trained neural network for processing, and predicting the score of the user on the object to be recommended; according to the score predicted by the neural network, the recommendation system can determine a preset number of recommended objects from the objects to be recommended and recommend the recommended objects to the user.

In one possible implementation, the method may further include: determining a second enhancement rate of the characteristic information of the plurality of first samples during the t-th round of training according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset change value of each round of enhancement rate; and determining a first enhancement rate of the feature information of the plurality of first samples during the t-th round of training according to the maximum enhancement rate and the second enhancement rate.

Wherein the preset value range of the initial enhancement rate is more than or equal to 0 and less than 1; the preset value range of the maximum enhancement rate is more than 0 and less than or equal to 1; the preset value range of each round of the enhancement rate change value is more than 0 and less than 1. The specific values of the initial enhancement rate, the maximum enhancement rate and the variation value of the enhancement rate in each round can be set by those skilled in the art according to practical situations, and the disclosure does not limit this.

In a possible implementation manner, second enhancement rates of the feature information of the plurality of first samples during the t-th round of training can be determined according to a preset initial enhancement rate, a preset maximum enhancement rate and a preset change value of each round of enhancement rate; then, judging the relation between the second enhancement rate and the maximum enhancement rate, and determining the second enhancement rate as a first enhancement rate of the characteristic information of a plurality of first samples during the t-th round of training under the condition that the second enhancement rate is less than or equal to the maximum enhancement rate; and determining the maximum enhancement rate as a first enhancement rate of the feature information of the plurality of first samples in the t-th training, if the second enhancement rate is greater than the maximum enhancement rate.

In one possible implementation, the first enhancement rate s of the feature information of the plurality of first samples in the t-th round of training can be determined by the following formula (1)_t：

s_t＝min(s,s₀+Δ·t) (1)

In the above equation (1), s represents a preset maximum enhancement rate and s ∈ (0, 1)]，s₀Represents a predetermined initial enhancement rate and s₀∈ [0,1), where Δ represents a preset change in the rate of enhancement for each round.

In this embodiment, the second enhancement rate of the feature information of the plurality of first samples during the t-th training can be determined according to the initial enhancement rate, the maximum enhancement rate, and the change value of the enhancement rate per round, and the minimum value between the maximum enhancement rate and the second enhancement rate is determined as the first enhancement rate of the feature information of the plurality of first samples during the t-th training, so that the first enhancement rate can be gradually increased from the initial enhancement rate to the maximum enhancement rate with the increase of the training round, and then kept unchanged.

In one possible implementation, the feature information of the plurality of first samples in the first training set may be represented by a feature matrix, each row of the feature matrix representing one first sample, and each column of the feature matrix representing one attribute.

For example, the first training set includes n first samples, each of which includes m pieces of feature information corresponding to m preset attributes, and the feature information of the plurality of first samples in the first training set may be represented as a feature matrix D ═ D_u,v}_n×mWherein, the u-th row of the feature matrix D represents the u-th first sample, the v-th column of the feature matrix D represents the v-th attribute, and the element D of the feature matrix D_u,vAnd representing the characteristic information corresponding to the v-th attribute in the u-th first sample, wherein n, m, u and v are positive integers, u is more than or equal to 1 and less than or equal to n, and v is more than or equal to 1 and less than or equal to m.

In this embodiment, a plurality of first samples in the first training set are represented as a feature matrix, which facilitates neural network processing and can improve processing efficiency of the neural network.

In one possible implementation, step S12 may include:

In one possible implementation, when determining the attention of the neural network to each attribute, a first contribution value of each feature information of the first sample to the prediction score may be determined first. For any first sample in the first training set, the first contribution value of each feature information of the first sample to the prediction score thereof can be respectively determined through inter-layer correlation propagation according to the feature information of the first sample and the prediction score corresponding to the first sample.

For example, for any first sample in the first training set, assuming that the first sample includes m pieces of feature information, according to the m pieces of feature information of the first sample and the prediction score corresponding to the first sample, a first contribution value of each piece of feature information of the first sample to the prediction score thereof may be determined through inter-layer correlation propagation, that is, each piece of feature information in the first sample corresponds to one first contribution value, and the first sample includes m pieces of feature information, and m pieces of first contribution values may be determined.

In a possible implementation manner, after the first contribution value is determined, for any attribute of the plurality of attributes, a second contribution value of the feature information corresponding to the attribute may be determined from the first contribution values of the feature information of the respective first samples, and the determined second contribution values are averaged to determine the average value as the attention of the neural network to the attribute.

For example, the first training set includes n first samples, and for the v-th attribute, a first contribution value of the feature information corresponding to the v-th attribute may be selected from first contribution values of the n first samples and determined as a second contribution value, where the second contribution value is n in total; then averaging the n second contribution valuesThe average is determined as the interest of the neural network for the vth attribute. I.e. attention of the neural network to the v-th attribute

Wherein the content of the first and second substances,

a first contribution value representing a prediction score of the vth characteristic information of the u-th first sample.

In this embodiment, first contribution values of the feature information of the first samples to the prediction scores of the first samples may be determined, then, for any attribute, second contribution values of the feature information corresponding to the attribute may be determined from the first contribution values, and an average value of the second contribution values may be determined as the attention of the neural network to the attribute, so that accuracy of the attention may be improved.

In one possible implementation manner, the neural network may include an input layer, an N-level intermediate layer, and an output layer, the input layer inputs the feature information of each first sample, the output layer outputs the prediction score corresponding to each first sample, the N-level intermediate layer outputs N-level intermediate feature information in the processing process, respectively, N is a positive integer,

In a possible implementation manner, the neural network may include an input layer, an N-level intermediate layer, and an output layer, where the input layer inputs the feature information of each first sample, the output layer outputs the prediction score corresponding to each first sample, and the N-level intermediate layer outputs N-level intermediate feature information in the processing process, respectively.

In a possible implementation manner, for any first sample, when determining a first contribution value of each feature information of the first sample to the prediction score thereof, starting from the prediction score output by the output layer, and proceeding layer by layer according to the hierarchical structure of the neural network, through inter-layer correlation propagation, determining the contribution value of the intermediate feature information output by each layer to the prediction score in turn until determining the first contribution value of each feature information of the input first sample to the prediction score.

In a possible implementation manner, the contribution value of each nth-level intermediate feature information to the prediction score may be determined according to the prediction score corresponding to the first sample. For example, assuming that the nth-level intermediate feature information is a predicted value of the E classification tags (where E is a positive integer), according to the prediction score corresponding to the first sample, the contribution value of the nth-level intermediate feature information corresponding to the correct classification tag in the E nth-level intermediate feature information to the prediction score may be determined to be 1, and the contribution values of the other nth-level intermediate feature information to the prediction score may be determined to be 0.

Then, according to the contribution value of each N-level intermediate characteristic information to the prediction score, the N-level intermediate characteristic information and the N-1-level intermediate characteristic information, the contribution value of each N-1-level intermediate characteristic information to the prediction score can be determined through correlation propagation between the N-1-level intermediate layer and the N-1-level intermediate layer.

In a possible implementation mode, according to the contribution value of each i-th-level intermediate characteristic information to the prediction score, the i-th-level intermediate characteristic information and the i-1-th-level intermediate characteristic information, the contribution value of each i-1-th-level intermediate characteristic information to the prediction score is respectively determined through correlation propagation between the i-1-th-level intermediate layer and the i-th-level intermediate layer, wherein i is an integer and is more than or equal to 2 and less than or equal to N;

for example, the input of the neural network is any first sample, the i-th level intermediate layer of the neural network is a full connection layer, and the q-th i-th level intermediate characteristic information output by the i-th level intermediate layer

Can be determined by the following equation (2):

in the above-mentioned formula (2),

represents the p (i-1) th level intermediate characteristic information of the (i-1) th level intermediate layer output,

the weight of the fully-connected layer is represented,

denotes the bias of the fully connected layer, relu (x) max (0, x) is a nonlinear activation function, and p and q are both positive integers.

The kth i-1 level intermediate characteristic information according to the back propagation of the full connection layer

Contribution to predictive score

Can be determined by the following equation (3):

in the above formula (3), k is a positive integer,

the weight of the fully-connected layer is represented,

represents the contribution value of the q-th i-th level intermediate characteristic information to the prediction score,

is a parameter in propagation of inter-layer correlation, and>sign (Z) is a sign function, and sign (Z) is 1 when Z is more than or equal to 0, otherwise sign (Z) is-1.

In a possible implementation manner, the first contribution value of each feature information of the first sample to the prediction score may be determined by inter-layer correlation propagation according to the contribution value of each level 1 intermediate feature information to the prediction score, the level 1 intermediate feature information, and the feature information of the first sample.

In this embodiment, starting from the prediction score corresponding to the first sample, according to the hierarchical structure of the neural network, the first contribution value of each piece of feature information of the first sample to the prediction score is determined layer by layer through interlayer correlation propagation, and the accuracy of the first contribution value can be improved.

In one possible implementation, step S13 may include: for any attribute, determining the attention degree of the neural network as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is smaller than a preset attention degree threshold value.

In one possible implementation, the preset attention threshold may be determined according to an increasing function of the current training turn t, and is used to automatically control the enhancement probability of the feature information according to the current training turn. The increasing function can be set by those skilled in the art according to practical needs, and the present disclosure does not limit this.

In a possible implementation manner, for any attribute, the relationship between the attention of the neural network to the attribute and a preset attention threshold value can be judged; in the event that the attention of the neural network to the attribute is less than the attention threshold, the attention of the neural network to the attribute may be determined as an enhanced probability of the attribute.

In one possible implementation, step S13 may further include: for any attribute, determining the product of the attention degree and a preset adjustment proportion as the enhancement probability of the attribute under the condition that the attention degree of the neural network to the attribute is greater than or equal to a preset attention degree threshold value.

The preset adjustment ratio is greater than 0 and less than 1, for example, the preset adjustment ratio is 0.1. The specific value of the adjustment ratio can be set by a person skilled in the art according to training needs, and the disclosure does not limit this.

In a possible implementation manner, for any attribute, in the case that the attention degree of the neural network to the attribute is greater than or equal to a preset attention degree threshold, the importance degree of the attribute may be considered to be higher, and in order to maintain the routine training of the neural network, the enhancement probability of the attribute may be reduced. The product of the attention of the neural network to the attribute and the preset adjustment proportion can be determined as the enhancement probability of the attribute.

In one possible implementation, the enhanced probability P of the v-th attribute may be determined by the following equation (4)_v：

In the above formula (4), F_vRepresents the attention of the neural network to the v-th attribute and represents the preset adjustment ratio (0)<<1) σ (t) is an increasing function with respect to t, max { F }₁,…,F_mDenotes the attention F of the neural network to m attributes₁,…,F_mMaximum value of (1), σ (t). max { F }₁,…,F_mAnd represents the attention threshold value in the t-th round of training.

In one possible implementation, after determining the enhanced probabilities of the attributes, normalization may be performed. The normalized enhanced probability of an attribute can be determined by the following equation (5):

in the above formula (5), P'_vExpressing the normalized enhanced probability, P, of the v-th attribute_jRepresents the enhanced probability of j ≦ m before normalization processing, ∑_jP_jRepresenting the sum of the enhanced probabilities of all attributes before normalization.

In this embodiment, when the attention of the neural network to the attribute is greater than or equal to the preset attention threshold, the product of the attention and the preset adjustment ratio is determined as the enhancement probability of the attribute, and the enhancement probability of the attribute with a higher importance degree may be reduced in an initial stage of training (for example, in the previous rounds) to maintain the conventional training, that is, in the initial stage of training, to ensure the utilization of the important features by the neural network.

In one possible implementation, step S14 may include:

In a possible implementation manner, the enhancement amount of the feature information of the plurality of first samples may be determined according to a preset first enhancement rate of the feature information of the plurality of first samples. For example, in the t-th round of training, the preset first enhancement rate of the feature information of the plurality of first samples is s_tIf the total amount of the feature information of the first samples in the first training set is n × m, the enhanced amount of the feature information of the first samples is n × m × s_t。

In one possible implementation, a plurality of second samples may be randomly selected from the plurality of first samples of the first training set, the number of the second samples being the same as the enhancement number of the feature information of the plurality of first samples, wherein the randomly selected plurality of second samples may be repeated_tThe number of randomly chosen second samples is the same as the enhancement number, also n × m × s_t。

In a possible implementation manner, for any second sample, one attribute may be randomly selected from the multiple attributes according to the enhanced probability of each attribute, and the feature information corresponding to the randomly selected attribute in the second sample is determined as the feature information to be updated.

In one possible implementation, the number of enhancements to the feature information of the plurality of first samples may be greater than the number of the plurality of first samples. In this case, there may be repeated feature information to be updated. When repeated feature information to be updated exists, the feature information to be updated can be reselected from the plurality of first samples according to the number of the repeated feature information until all the feature information to be updated is not repeated. For example, the enhancement number of the feature information of the plurality of first samples is 100, 5 of the selected 100 feature information to be updated are repeated with other features, and 5 of the non-repeated feature information to be updated are re-selected from the plurality of first samples, so that none of the 100 feature information to be updated is repeated.

The following illustrates the determination process of the feature information to be updated. Assuming that the first enhancement rate of the feature information of the preset multiple first samples is s during the t-th round of training_tThe first training set is represented as the feature matrix D ═ D_u,v}_n×mFirst, the enhancement number G of the feature information of the first samples n × m × s may be determined_t(ii) a Then, in the feature matrix D, G rows are randomly selected to determine G second samples G_aA is positiveAn is an integer of 1 to G; for any second sample g_aAccording to the enhancement probability of each attribute, randomly selecting one attribute c from the plurality of attributes (wherein c is more than or equal to 1 and less than or equal to m, and the probability of the selected c attribute is the enhancement probability P 'of the c attribute'_c) Obtaining a second sample g_aCorresponding row column pair (g)_aAnd c); using the same method, row and column pairs corresponding to G second samples can be obtained; judging whether the G row-column pairs have repeated row-column pairs or not, if so, reselecting the row-column pairs according to the number of the repeated row-column pairs until the G row-column pairs are not repeated; and then determining the characteristic information of the corresponding positions of the G row-column pairs as the characteristic information to be updated.

Assuming that the preset noise eigenvalue U is equal to 0, 0 may be used to replace the characteristic information of the corresponding positions of G row-column pairs in the characteristic matrix D, and the other characteristic information remains unchanged, so as to obtain an updated characteristic matrix

D^GRepresented as the updated second training set. Can be based on the feature matrix D^GAnd carrying out the t-th round of training on the neural network.

In this embodiment, the enhancement number of the feature information of the plurality of first samples can be determined according to the first enhancement rate of the feature information of the plurality of first samples, the plurality of second samples are randomly selected from the plurality of first samples according to the enhancement number, and the feature information to be updated in the plurality of second samples is determined according to the enhancement probability of each attribute, so that the determined feature information to be updated meets the first enhancement rate and the enhancement probability of each attribute during the t-th round of training, and the accuracy of the feature information to be updated is improved.

Fig. 2 shows a schematic diagram of an application scenario of a feature enhancement based recommendation system neural network training method according to an embodiment of the present disclosure. As shown in fig. 2, an initial first training set may be determined in step S201, where the first training set may include a plurality of first samples, and may be represented as a feature matrix D, then, in step S202, a current training round t is determined, and in step S203, the plurality of first samples in the first training set are input into a t-th round of neural network to be trained for processing, so as to obtain prediction scores corresponding to the plurality of first samples;

then, in step S204, the attention degree of the neural network to each attribute may be determined according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples, and F may be used_vRepresenting the attention of the neural network to the v attribute; in step S205, an attention threshold value in the t-th training round is determined, for example, the attention threshold value in the t-th training round is σ (t) · max { F }₁,…,F_m}；

Then, it can be determined in step S206 whether the attention of the neural network to each attribute is greater than or equal to the attention threshold, and if the attention of the neural network to the attribute is greater than or equal to the attention threshold, step S207 is executed, where the enhanced probability of the attribute is equal to the preset adjustment ratio ×, for example, the attention F of the neural network to the v-th attribute_vWhen the value is greater than or equal to the attention threshold, the enhancement probability P of the v-th attribute_v＝·F_vWherein denotes the adjustment ratio; otherwise, step S208 is executed, where the enhanced probability of the attribute is the attention degree of the neural network to the attribute, for example, the attention degree F of the neural network to the v-th attribute_vThe enhancement probability P of the v-th attribute is smaller than the attention threshold_v＝F_v；

Then, step S209 is executed, and the enhancement probabilities of the attributes determined in steps S207 and S208 are normalized to obtain normalized enhancement probabilities of the attributes;

after step S202 is performed, in step S210, a first enhancement rate S of the feature information of the plurality of first samples during the t-th round of training may be determined_tAnd in step S211, determining the enhancement quantity of the feature information of the plurality of first samples according to the first enhancement rate determined in step S210;

after steps S211 and S209 are performed, in step S212, the number of enhancements of the feature information of the plurality of first samples determined in step S211 and the attribute values of the respective attributes determined in step S209 may be determinedEnhancing probability, determining the characteristic information to be updated, updating the first sample in the first training set according to the characteristic information to be updated and a preset noise characteristic value to obtain an updated second training set, wherein a characteristic matrix of the updated second training set is represented as D^G；

After the second training set is determined in step S212, step S213 may be executed to perform a tth round of training on the neural network according to the second training set; after the t-th round of training is completed, in step S214, it may be determined whether the neural network meets a preset training end condition, and when the neural network does not meet the training end condition, step S215 is executed, where 1 is added to the training round, that is, t is t +1, and then step S202 is executed to perform the next round of training; when the neural network meets the training end condition, the training can be ended to obtain the trained neural network.

The following describes a method for training a neural network of a recommendation system based on feature enhancement with reference to a specific example.

The method comprises the steps that a neural network is assumed to be applied to a movie and television work recommendation system and used for predicting scores of users on movie and television works, a prediction result is expressed as a score, and the value range of the score is 1-5; the user attributes are 4, including user identification, age, gender and occupation, the object to be recommended is a movie work, the object attributes are 21, including movie identification, year and 19 movie categories (science fiction, love, war, etc.).

The total number of samples is 10 thousands, and the characteristic information of each sample comprises characteristic information representing the attribute of the user and characteristic information representing the attribute of the object of the film and television work.

During the t-th round of training, the specific values of the preset variables are shown in the following table 1:

TABLE 1

All samples may be first represented as a feature matrix D₁Feature matrix D₁Has 1 × 10⁵Row, 25 columns; then, inputting each sample into the neural network to be trained in the t-th round respectively for processing to obtain the sampleA corresponding prediction score; determining the attention degree of the neural network to each attribute through interlayer correlation propagation according to the prediction scores corresponding to the samples and the characteristic information of the samples, wherein the attention degree of the neural network to the v' th attribute can be represented as F_v′Wherein v' is 1, …, 25;

can be determined according to the formula σ (t) ═ 1.1/(1+ e)^3-t) And 0.1 and the attention of the neural network to each attribute, the enhancement probability of each attribute is determined by the above formula (4). Enhanced probability P of the v' th attribute_v′Comprises the following steps:

and carrying out normalization processing by the formula (5) to obtain normalized enhancement probability of each attribute, wherein the normalized enhancement probability of the v 'th attribute can be expressed as P'_v′；

Then, the maximum enhancement rate s is 0.2, and the initial enhancement rate s is determined₀The first enhancement rate s of the feature information of all samples at the time of the tth round of training is determined by the above equation (1) with the enhancement rate change value Δ of 0.005 for each round of 0.1_t：

s_t＝min(s,s₀+Δ·t)＝min(0.2,0.1+0.005t)；

The enhancement number G of feature information for all samples can then be determined:

G＝n×m×s_t＝10⁵×25×min(0.2,0.1+0.005t)；

can be derived from the feature matrix D according to the enhancement quantity G₁Randomly selecting G rows, and randomly selecting one column according to the enhanced probability of each attribute for any one row in the selected G rows, wherein the probability of the selected v 'column is P'_v′(ii) a The characteristic information of the corresponding positions of the G row-column pairs is selected, namely the characteristic information to be updated; replacing the feature matrix D with a noise eigenvalue U-0₁Obtaining the updated feature matrix from the feature information to be updated

With the initial feature matrix D₁Feature matrix

Is a feature matrix after feature enhancement.

All samples can be divided into a training set, a test set and a validation set according to a ratio of 8:1: 1. Accordingly, the feature matrix may be scaled by 8:1:1

Partitioning into feature matrices corresponding to training sets

Feature matrices corresponding to test sets

And a test matrix corresponding to the validation set

Feature matrices corresponding to the training set may be used

And carrying out the t round training on the neural network. After the t round of training is completed, the feature matrix corresponding to the test set can be used

Evaluating the effect of a neural network using a test matrix corresponding to a validation set

Judging whether the neural network meets a preset training end condition, for example, the training end condition can be that the effect of the neural network is reduced on the verification set in 5 continuous rounds; when the neural network does not meet the training end condition, carrying out the next round of training; and when the neural network meets the training end condition, ending the training to obtain the trained neural network.

In one possible implementation, RMSE (Root mean square Error) may be used when evaluating the effect of the neural network. The smaller the RMSE value, the better the neural network. Compared with a neural network which does not use the training method, the RMSE value of the neural network which uses the training method is smaller, the effect of the neural network which uses the training method is better, and the accuracy of the prediction result of the neural network is higher.

According to the embodiment of the disclosure, when the method is applied to neural network training of a recommendation system, the attention threshold and the first enhancement rate can be adjusted according to the current training round, so that the feature information corresponding to the attribute with low attention can be enhanced in the initial stage (for example, the first rounds) of the neural network training, and along with the enhancement of the training round, the feature information corresponding to the attribute with high attention is gradually enhanced, so that the neural network is promoted to learn the condition that part of the feature information is noisy, the over-fitting or over-dependence of the neural network on part of the feature information can be avoided, the robustness of the neural network is improved, and meanwhile, the accuracy of neural network prediction is also improved.

It should be noted that, although the above embodiments are described as examples of the feature enhancement based recommendation system neural network training method, those skilled in the art can understand that the disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.

FIG. 3 shows a block diagram of a feature enhancement based recommendation system neural network training apparatus, according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus includes:

the predicted score determining module 31 is configured to input a plurality of first samples in a preset first training set into a t-th to-be-trained neural network for processing, so as to obtain predicted scores corresponding to the plurality of first samples, where t is a positive integer, and the first samples include feature information representing user attributes and feature information representing object attributes of an object to be recommended;

an attention degree determining module 32, configured to determine attention degrees of the neural network to each attribute according to the feature information of the plurality of first samples and the prediction scores corresponding to the plurality of first samples;

an enhanced probability determining module 33, configured to determine enhanced probabilities of the attributes according to a preset attention threshold and attention of the neural network to the attributes, respectively;

a to-be-updated feature determining module 34, configured to determine, according to a preset first enhancement rate of the feature information of the multiple first samples and the enhancement probability of each attribute, feature information to be updated from the feature information of the multiple first samples;

a training set updating module 35, configured to update a first sample in the first training set according to the feature information to be updated and a preset noise feature value, so as to obtain an updated second training set;

a training module 36, configured to perform a tth round of training on the neural network according to the second training set,

In one possible implementation, the apparatus further includes:

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A recommendation system neural network training method based on feature enhancement is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the attention of the neural network to each attribute according to the feature information of the first samples and the prediction scores corresponding to the first samples comprises:

3. The method according to claim 1, wherein determining feature information to be updated from the feature information of the plurality of first samples according to a preset first enhancement rate of the feature information of the plurality of first samples and the enhancement probability of each attribute comprises:

4. The method of claim 1, further comprising:

5. The method of claim 1, wherein determining the enhanced probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute respectively comprises:

6. The method of claim 5, wherein determining the enhanced probability of each attribute according to a preset attention threshold and the attention of the neural network to each attribute respectively, further comprises:

7. The method according to claim 2, wherein the neural network includes an input layer that inputs the feature information of each first sample, an N-level intermediate layer that outputs the prediction score corresponding to each first sample, and an output layer that outputs N-level intermediate feature information in the process, respectively, where N is a positive integer,

8. The method of claim 1, wherein the feature information of the first samples in the first training set is represented by a feature matrix, each row of the feature matrix representing one first sample, and each column of the feature matrix representing one attribute.

9. A recommendation system neural network training apparatus based on feature enhancement, the apparatus comprising:

10. The apparatus of claim 9, further comprising: