WO2021169451A1

WO2021169451A1 - Content recommendation method and apparatus based on adversarial learning, and computer device

Info

Publication number: WO2021169451A1
Application number: PCT/CN2020/132592
Authority: WO
Inventors: 方聪; 张旭; 郑越; 旷雄; 黄宇星
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-09-28
Filing date: 2020-11-30
Publication date: 2021-09-02
Also published as: CN112182384A; CN112182384B

Abstract

Provided is a content recommendation method based on adversarial learning. The method relates to the field of artificial intelligence, and comprises: obtaining, by means of weighted compression of a pre-constructed user feature, a weighted compression vector corresponding to a historical behavior feature of a user (S1); modeling a generator and a discriminator according to the weighted compression vector (S2); combining the generator and the discriminator that have been subjected to modeling, and performing adversarial learning under an adversarial model (S3); determining whether the adversarial learning of the generator and the discriminator reaches a pre-set condition (S4); if so, inputting historical information of the current user into the generator after adversarial learning, and determining interest preference features of the current user in combination with a feedback value of the discriminator after adversarial learning (S5); and according to the interest preference features of the current user, recommending, to the current user, content information matching the interest preference features of the current user (S6). Behavior features are modeled by means of weighted compression, time sequence change features of user behavior features are captured, and a generator can acquire interest preference features on the basis of adversarial learning, so as to accurately recommend content information.

Description

Content recommendation method, device and computer equipment based on confrontation learning

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 28, 2020, with the application number 2020110449667, and the invention title "Methods, devices and computer equipment for content recommendation based on adversarial learning", and its entire contents Incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence, in particular to content recommendation methods, devices and computer equipment based on adversarial learning.

Background technique

Existing content recommendation systems are generally based on manual feature extraction, collaborative filtering and decomposition techniques to achieve automatic recommendation. By collecting user behavior data, system log data and other information, the user’s preferences and interests are modeled, and users are based on their preferences. Interests are clustered and grouped, and the same kind of content is recommended for users with similar preferences and interests. However, the inventor realizes that the existing content recommendation system regards the collected user behavior data as statistical features, but cannot take into account the temporal logic of the development and change of user preferences and interests, and the recommended content does not have the automatic update function that keeps pace with the times. .

technical problem

The main purpose of this application is to provide content recommendation based on adversarial learning, which aims to solve the technical problem that the temporal logic of the development and change of user preference and interest cannot be considered, and the recommended content does not have the technical problem of automatic updating with the times.

Technical solutions

This application proposes a content recommendation method based on adversarial learning, including:

Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;

Model the generator and discriminator based on the weighted compression vector;

Combine the modeled generator and discriminator to conduct adversarial learning under the adversarial model;

Determine whether the adversarial learning of the generator and the discriminator meets the preset conditions;

If yes, input the historical information of the current user into the generator after adversarial learning, and combine the feedback value of the discriminator after adversarial learning to determine the interest preference feature of the current user;

According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.

This application also provides a content recommendation device based on adversarial learning, including:

The obtaining module is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;

The modeling module is used to model the generator and the discriminator according to the weighted compression vector;

The adversarial learning module is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

The first judgment module is used to judge whether the confrontation learning of the generator and the discriminator meets the preset condition;

The determining module is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;

The recommendation module is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.

The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the computer program is executed.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.

Beneficial effect

This application uses weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users and accurately recommend content information.

Description of the drawings

FIG. 1 is a schematic flowchart of a content recommendation method based on adversarial learning according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of a content recommendation device based on adversarial learning according to an embodiment of the present application;

Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.

The best way to implement this application

1, a content recommendation method based on adversarial learning according to an embodiment of the present application includes:

S1: Obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the pre-built user characteristics;

S2: Model the generator and the discriminator according to the weighted compression vector;

S3: Combine the modeled generator with the discriminator, and conduct adversarial learning under the adversarial model;

S4: Determine whether the adversarial learning of the generator and the discriminator meets the preset conditions;

S5: If yes, input the historical information of the current user into the generator after adversarial learning, and combine the feedback value of the discriminator after adversarial learning to determine the interest preference feature of the current user;

S6: According to the interest preference feature of the current user, recommend to the current user content information that matches the interest preference feature of the current user.

The user characteristics constructed in the embodiments of the present application include user attribute characteristics P, historical click characteristics T, behavior cue characteristics Q, and user click behavior c. User attribute features P include but are not limited to user profile information such as age and occupation; behavioral clue features Q include but are not limited to promoted information types, preferential policies, etc.; historical click features T include, but are not limited to, user historical personal information and user history Click content information; user click behavior c includes whether the assignment of the click behavior is true, if it is true, the click behavior occurred, otherwise it did not occur. This application uses weighted compression to encode the time series features in the above user features to form a time series feature matrix, and uses the time series feature matrix and user attribute features to model the generator and the discriminator and fight against learning, so that the generated generation after learning The device can identify the time sequence features in the user characteristics, obtain the user's interest preference feature carrying the time sequence change feature, and then recommend the content information according to the user's interest preference feature. Compared with the existing content information recommendation directly based on static historical data, this application is more in line with the interest preferences of current users, and the recommended content is more accurate and targeted.

Further, the step S1 of obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics includes:

S11: In the two-dimensional space of the time series dimension and the feature dimension, the user features are coded according to the time series to obtain the time series feature matrix corresponding to the user features;

S12: Multiply the time series feature matrix and the first compression weight matrix to obtain the first product matrix after data compression;

S13: After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;

S14: Input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;

S15: Join the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first stitching vector;

S16: Multiply the first splicing vector and the second compression weight matrix to obtain a second product matrix after data compression;

S17: After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.

In the embodiment of the present application, the user characteristics are coded according to the time sequence in the two-dimensional space of the time sequence dimension and the feature dimension to obtain the time sequence feature matrix corresponding to the user characteristics. The above-mentioned user historical behavior characteristics are characteristic representations of historical data of user characteristics, and are a combination of user characteristics and historical time series characteristics. This application processes the time series feature matrix through one-level weighted compression to obtain the embedding vector corresponding to the user's historical behavior feature. The calculation process to obtain the first-level weighted compression of the embedding vector is as follows:

Among them, ^St represents the embedding vector, and h represents according to the time series feature matrix

The operator to perform the operation, vec represents the operator that pulls into a vector, σ represents the sigmoid function, W represents the feature weight matrix, that is, the aforementioned first compression weight matrix, and B represents the feature bias vector, that is, the aforementioned first bias vector. Two weighted compression process, i.e. embedded vector S ^t and wherein when the designated time sequence after _a splice ^t f t, multiplied by the compression weight matrix V, paranoid vector B plus compression, compressed to give weighting vector

This application uses two-level weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics following time series changes, simulate the trend of interest preferences despite time changes, and follow the deviation changes of interest preferences in time, and update content information Recommended strategy. The terms "first" and "second" in this application are used for distinction and not for limitation. Other similar terms have the same function and will not be repeated.

Further, the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. The step S2 of modeling the generator and the discriminator according to the weighted compression vector includes:

S21: Perform vector splicing on user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;

S22: Under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;

S24: Determine whether the first cross-entropy loss function reaches the minimum value;

S25: If yes, get the generator model.

In the embodiment of the present application, the second stitching vector [P; T; Q] is obtained by performing vector stitching of user attribute characteristics, historical click features, and behavior cue features. When modeling the discriminator in this application, the sample training data is first constructed. The specific method is to splice the second splicing vector [P;T;Q] with the cpred output by the generator as the negative sample feature vector; use the second splicing vector [P;T;Q] is spliced with the user's real click c as a positive sample feature vector. The model formula of the generator of this application is as follows:

Among them, φ is a strategy model based on a multi-layer convolutional neural network, R(φ) is a regularization term, η is a regularization parameter, and r represents a discriminator with fixed parameters. The output of the generator when the second splicing vector [P;T;Q] is input is expressed as cpred=MultiConv([P;T;Q]), and the above-mentioned first cross-entropy loss function is expressed as: lossg=CrossEntropy(cpred, c), which means the loss metric between cpred and c. The parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.

Further, before step S21 of performing vector splicing of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain the second splicing vector, the method includes:

S201: Input the weighted compression vector into the sigmoid function to obtain the output result of the weighted compression vector;

S202: Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;

S203: Use the calculation method of the reward value as a model of the discriminator.

The formula of the discriminator model of this application is:

v _T represents the parameter of the reward function.

Further, the step S3 of adversarial learning is performed under the adversarial model by combining the modeled generator and the discriminator, including:

S31: splicing the second splicing vector with the modeling result of the generator to form a negative sample feature vector, splicing the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;

S32: Input the negative sample feature vector and the positive sample feature vector into the discriminator, fix the generator parameters, and model the discriminator under the constraints of the second cross-entropy loss function;

S33: Determine whether the second cross-entropy loss function reaches the minimum value;

S34: If yes, determine the parameters of the discriminator;

S35: According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through the confrontation model until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum value.

The second cross-entropy loss function of this application consists of two parts, one part corresponds to the output constraint of the generator to the second stitching vector, and the other part corresponds to the output constraint to the real click action, namely loss _d = loss ₁ + loss ₂ , loss ₁ = CrossEntropy(0, MultiConv([P; T; Q; cpred])); loss ₂ = CrossEntropy(1, MultiConv([P; T; Q; c])). The formula of the confrontation model of this application is expressed as:

Among them, θ represents the optimized parameters of the discriminator in the adversarial learning, and α represents the parameters of the generator in the adversarial learning. In the adversarial learning of this application, the learning goal of the generator is to generate a similar user click behavior cpred as much as possible according to the constructed vector of user characteristics, while the learning goal of the discriminator is to be able to distinguish the real user click behavior from the generator generation Similar to user click behavior. In the adversarial learning, the parameters of the discriminator and generator are alternately fixed. First fix the parameters of the discriminator, and _{train the generator through loss g} . When the loss _g drops, it means that the cpred generated by the generator successfully deceived the discriminator. Then fix the generator parameters and train the discriminator under the constraint of _{loss d} _{. When loss d} drops, it means that the discriminator has successfully distinguished between cpred and c. Alternate training and learning until loss _d and loss _g are both smaller than the preset threshold and reach the minimum value. The generator at this time can consider the user's historical click information and imitate the user's click action decision as much as possible, while the discriminator can simulate the feedback of the user's click action.

Further, inputting the historical information of the current user into the generator after adversarial learning, and combining the feedback value of the discriminator after adversarial learning, the step S5 of determining the interest preference feature of the current user includes:

S51: Input the current user's historical information and designated marketing activity information into the generator after confrontation learning;

S52: Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;

S53: If yes, determine that the specified marketing activity information belongs to the current user's interest preference feature.

The embodiments of this application are used for selecting marketing activity information as an example for detailed description. The above-mentioned marketing information includes, but is not limited to, red envelopes, discount coupons, rebates, etc. By inputting different marketing activities information to the vector corresponding to the characteristic vector of the current user’s historical information into the generator, the generator simulates the user’s different marketing activities. The click behavior of activity information, and the size of the difference value fed back by the discriminator, determine the user's interest and preference for different marketing activity information.

Further, after the step S6 of recommending to the current user content information matching the current user's interest preference feature according to the current user's interest preference feature, the method includes:

S61: Acquire a designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;

S62: Change the range of feature data when the specified feature is input to the discriminator;

S63: Obtain the change range of the output value following the corresponding change of the characteristic data range;

S64: Determine whether the change range of the output value exceeds the preset range;

S65: If yes, determine that the specified feature is a sensitive feature that affects the user's click action.

In the embodiment of the present application, the user's historical characteristics and real click behavior are input into the discriminator, and the discriminator feedback output value is 1, indicating that it is a real click behavior. For example, the specified feature is time. The feature data range includes the time span. Following the change of the time span, the change range of the discriminator output value also changes significantly, indicating that the user is sensitive to the time feature, and the time feature is determined as the user's sensitive feature. Form a continuously developing portrait of users through sensitive features, so as to update user classification and clustering in real time.

2, a content recommendation device based on adversarial learning according to an embodiment of the present application includes:

The obtaining module 1 is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;

Modeling module 2, used to model the generator and the discriminator according to the weighted compression vector;

The adversarial learning module 3 is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

The first judging module 4 is used to judge whether the confrontation learning of the generator and the discriminator reaches the preset condition;

The determination module 5 is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;

The recommendation module 6 is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.

Further, module 1 is obtained, including:

The coding unit is used to code the user characteristics in a time series in the two-dimensional space of the time series dimension and the feature dimension to obtain a time series feature matrix corresponding to the user characteristics;

The first multiplication unit is configured to multiply the time series feature matrix and the first compression weight matrix to obtain the first product matrix after data compression;

The first correction unit is configured to obtain the first correction matrix after correcting the first product matrix by the first paranoia vector;

The first input unit is configured to input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;

The first splicing unit is used to splice the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector;

The second multiplication unit is configured to multiply the first splicing vector and the second compression weight matrix to obtain a second product matrix after data compression;

The second correction unit is used to correct the second product matrix through the second paranoia vector to obtain the weighted compression vector corresponding to the user's historical behavior feature.

Further, user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. Modeling module 2 includes:

The second splicing unit is used to perform vector splicing of user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second splicing vector;

The first modeling unit is used to input the second stitching vector into the model of the generator under the fixed model parameters of the discriminator, and model the model of the generator through the first cross-entropy loss function constraint;

The first judging unit is used to judge whether the first cross-entropy loss function reaches the minimum value;

Get unit is used to get the model of the generator if it reaches the minimum value.

Among them, φ is a strategy model based on a multilayer convolutional neural network, R(φ) is a regularization term, η is a regularization parameter, and r represents a discriminator with fixed parameters. The output of the generator when the second splicing vector [P;T;Q] is input is expressed as cpred=MultiConv([P;T;Q]), and the above-mentioned first cross-entropy loss function is expressed as: lossg=CrossEntropy(cpred, c), which means the loss metric between cpred and c. The parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.

Further, the second splicing unit includes:

The input subunit is used to input the weighted compression vector into the sigmoid function to obtain the output result of the weighted compression vector;

Obtain the subunit, which is used to multiply the output result of the weighted compression vector by the parameter of the reward function to obtain the reward value;

As a sub-unit, it is used to use the calculation method of the reward value as a model of the discriminator.

The formula of the discriminator model of this application is:

v _T represents the parameter of the reward function.

Further, the confrontation learning module 3 includes:

The third splicing unit is used to splice the second splicing vector with the modeling result of the generator to form a negative sample feature vector, and splice the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;

The second modeling unit is used to input the negative sample feature vector and the positive sample feature vector into the discriminator, fix the generator parameters, and model the discriminator under the constraints of the second cross-entropy loss function;

The second judging unit is used to judge whether the second cross-entropy loss function reaches the minimum value;

The determination unit is used to determine the parameters of the discriminator if the minimum value is reached;

The confrontation learning unit is used for learning the generator and the discriminator through the confrontation model according to the modeling process of the generator and the discriminator, until the first cross entropy loss function and the second cross entropy loss function both reach the minimum.

Further, the determining module 5 includes:

The second input unit is used for inputting the current user's history information and designated marketing activity information into the generator after confrontation learning;

The third judgment unit is used to judge whether the feedback value of the discriminator after confrontation learning is equal to 1;

The determining unit is used for determining that the specified marketing activity information belongs to the current user's interest preference feature if it is equal to 1.

Further, the content recommendation device based on adversarial learning includes:

The first acquisition module is used to acquire the designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;

The change module is used to change the range of feature data when the specified feature is input to the discriminator;

The second acquisition module is used to acquire the output value change range that follows the corresponding change of the characteristic data range;

The second judgment module is used to judge whether the change range of the output value exceeds the preset range;

The judging module is used for judging that the specified feature is a sensitive feature that affects the user's click action if it exceeds the preset range.

Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used to store all the data required for the content recommendation process based on adversarial learning. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize the content recommendation method based on adversarial learning.

The processor executes the content recommendation method based on adversarial learning, including: obtaining a weighted compression vector corresponding to the user's historical behavior feature through weighted compression of pre-built user characteristics; modeling the generator and the discriminator according to the weighted compression vector; The modeled generator is combined with the discriminator to conduct adversarial learning under the adversarial model; judge whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, input the current user's historical information into the adversarial learning generator In the process, the feedback value of the discriminator after adversarial learning is combined to determine the current user's interest preference feature; according to the current user's interest preference feature, content information matching the current user's interest preference feature is recommended to the current user.

The above-mentioned computer equipment models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain online users' interest and preference characteristics, and accurately recommend content information.

In one embodiment, the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension. Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained. A correction matrix; input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.

In an embodiment, the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. The step of modeling the generator and the discriminator according to the weighted compression vector by the above-mentioned processor includes: combining the user attribute characteristics and the historical click characteristics Perform vector stitching with behavioral clues to obtain the second stitching vector; under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is constrained by the first cross-entropy loss function. Perform modeling; determine whether the first cross-entropy loss function reaches the minimum value; if so, obtain the generator model.

In one embodiment, before the step of performing vector splicing of the user attribute characteristics, historical click characteristics, and behavior cue characteristics, to obtain the second splicing vector, the processor includes: inputting the weighted compression vector into the sigmoid function to obtain the weighted compression vector Output result; multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value; use the calculation method of the reward value as the model of the discriminator.

In one embodiment, the above-mentioned processor combines the modeled generator with the discriminator, and performs the step of adversarial learning under the adversarial model, including: splicing the second splicing vector with the modeling result of the generator to form a negative sample The feature vector, the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into the positive sample feature vector; the negative sample feature vector and the positive sample feature vector are input to the discriminator, the generator parameters are fixed, and the second cross-entropy loss Model the discriminator under the constraints of the function; judge whether the second cross-entropy loss function reaches the minimum value; if so, determine the parameters of the discriminator; according to the modeling process of the generator and the discriminator, the generator and the discriminator are combined through the confrontation model The discriminator fights against learning until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum.

In one embodiment, the above-mentioned processor inputs the historical information of the current user into the generator after the confrontation learning, and combines the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user. The historical information and designated marketing activity information are input into the generator after confrontation learning; it is judged whether the feedback value of the discriminator after confrontation learning is equal to 1; if so, it is determined that the designated marketing activity information belongs to the current user’s interest preference feature.

In one embodiment, after the above-mentioned processor recommends to the current user content information that matches the current user’s interest preference feature according to the current user’s interest preference feature, the step includes: acquiring a specified feature that affects the user’s click action, where the specified A feature is any one of all the features that affect the user's click action; change the feature data range when the specified feature is input to the discriminator; obtain the output value change range that follows the corresponding change of the feature data range; determine whether the output value change range exceeds the preset range; if so , It is determined that the specified feature is a sensitive feature that affects the user's click action.

Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.

An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. A computer program is stored thereon. When the computer program is executed by the processor, the The learning content recommendation method includes: obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance; modeling the generator and the discriminator according to the weighted compression vector; and combining the modeled generator with The discriminator is combined to conduct adversarial learning under the adversarial model; it is judged whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, the current user’s historical information is input into the adversarial learning generator, combined with the adversarial learning The feedback value of the discriminator determines the interest preference feature of the current user; according to the interest preference feature of the current user, it is recommended to the current user content information that matches the interest preference feature of the current user.

The above-mentioned computer-readable storage medium models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics changing with time series, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users, and Accurately recommend content information.

In one embodiment, the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension. Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained A correction matrix; input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by computer programs instructing relevant hardware. The above-mentioned computer programs can be stored in a non-volatile computer readable storage medium. Here, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.

The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the specification and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims

A content recommendation method based on adversarial learning, which includes:

Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;

Modeling the generator and the discriminator according to the weighted compression vector;

Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;

If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;

According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
The content recommendation method based on adversarial learning according to claim 1, wherein the step of obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighted compression of the user features constructed in advance, comprises:

In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;

Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;

After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;

Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;

Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;

Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;

After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
The content recommendation method based on adversarial learning according to claim 1, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the generator and the discriminator are constructed according to the weighted compression vector The steps of the model include:

Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;

When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;

Judging whether the first cross-entropy loss function reaches a minimum value;

If so, the model of the generator is obtained.
The content recommendation method based on adversarial learning according to claim 3, wherein before the step of performing vector splicing of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second splicing vector, the method comprises:

Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;

Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;

The calculation method of the reward value is used as a model of the discriminator.
The content recommendation method based on adversarial learning according to claim 1, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:

The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ；

Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;

Judging whether the second cross-entropy loss function reaches a minimum value;

If yes, determine the parameters of the discriminator;

According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
The content recommendation method based on adversarial learning according to claim 1, wherein said inputting historical information of the current user into said generator after adversarial learning is combined with the feedback value of said discriminator after adversarial learning to determine The step of the current user's interest preference feature includes:

Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;

Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;

If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.
The content recommendation method based on adversarial learning according to claim 1, wherein the step of recommending to the current user content information that matches the current user’s interest preference feature according to the current user’s interest preference feature After that, include:

Acquiring a specified feature that affects the user's click action, where the specified feature is any one of all the features that affect the user's click action;

Changing the range of feature data when the specified feature is input to the discriminator;

Acquiring a change range of the output value following the corresponding change of the characteristic data range;

Judging whether the change range of the output value exceeds a preset range;

If so, it is determined that the specified feature is a sensitive feature that affects the user's click action.
A content recommendation device based on adversarial learning, which includes:

The obtaining module is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;

A modeling module for modeling the generator and the discriminator according to the weighted compression vector;

The adversarial learning module is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

The first judgment module is used to judge whether the adversarial learning of the generator and the discriminator reaches a preset condition;

The determination module is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference of the current user feature;

The recommendation module is configured to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
A computer device includes a memory and a processor, the memory stores a computer program, wherein the processor executes the computer program to implement a content recommendation method based on adversarial learning, including:

Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;

Modeling the generator and the discriminator according to the weighted compression vector;

Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;

If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;

According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
8. The computer device according to claim 9, wherein the step of obtaining a weighted compression vector corresponding to the user's historical behavior characteristic by weighting and compressing the user characteristics constructed in advance, comprises:

In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;

Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;

After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;

Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;

Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;

Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;

After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
The computer device according to claim 9, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the step of modeling the generator and the discriminator according to the weighted compression vector includes :

Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;

When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;

Judging whether the first cross-entropy loss function reaches a minimum value;

If so, the model of the generator is obtained.
11. The computer device according to claim 11, wherein before the step of performing vector stitching on the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain the second stitching vector, the step comprises:

Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;

Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;

The calculation method of the reward value is used as a model of the discriminator.
The computer device according to claim 9, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:

The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ；

Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;

Judging whether the second cross-entropy loss function reaches a minimum value;

If yes, determine the parameters of the discriminator;

According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
The computer device according to claim 9, wherein said inputting the historical information of the current user into the generator after confrontation learning, and combining the feedback value of the discriminator after confrontation learning to determine the current user’s The steps of interest preference feature include:

Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;

Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;

If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.
A computer-readable storage medium with a computer program stored thereon, wherein the method for content recommendation based on adversarial learning when the computer program is executed by a processor includes:

Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;

Modeling the generator and the discriminator according to the weighted compression vector;

Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;

Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;

If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;

According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
15. The computer-readable storage medium according to claim 15, wherein the step of obtaining a weighted compression vector corresponding to the user's historical behavior characteristic by weighting and compressing the user characteristics constructed in advance, comprises:

In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;

Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;

After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;

Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;

Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;

Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;

After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
The computer-readable storage medium according to claim 15, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the generator and the discriminator are modeled according to the weighted compression vector The steps include:

Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;

When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;

Judging whether the first cross-entropy loss function reaches a minimum value;

If so, the model of the generator is obtained.
18. The computer-readable storage medium according to claim 17, wherein before the step of performing vector stitching of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second stitching vector, the step comprises:

Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;

Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;

The calculation method of the reward value is used as a model of the discriminator.
15. The computer-readable storage medium according to claim 15, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:

The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ；

Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;

Judging whether the second cross-entropy loss function reaches a minimum value;

If yes, determine the parameters of the discriminator;

According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
The computer-readable storage medium according to claim 15, wherein said inputting the historical information of the current user into said generator after confrontation learning, and combining the feedback value of said discriminator after confrontation learning to determine said The steps of the current user’s interest preference feature include:

Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;

Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;

If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.