WO2021169451A1 - Content recommendation method and apparatus based on adversarial learning, and computer device - Google Patents

Content recommendation method and apparatus based on adversarial learning, and computer device Download PDF

Info

Publication number
WO2021169451A1
WO2021169451A1 PCT/CN2020/132592 CN2020132592W WO2021169451A1 WO 2021169451 A1 WO2021169451 A1 WO 2021169451A1 CN 2020132592 W CN2020132592 W CN 2020132592W WO 2021169451 A1 WO2021169451 A1 WO 2021169451A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
user
discriminator
generator
feature
Prior art date
Application number
PCT/CN2020/132592
Other languages
French (fr)
Chinese (zh)
Inventor
方聪
张旭
郑越
旷雄
黄宇星
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021169451A1 publication Critical patent/WO2021169451A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence, in particular to content recommendation methods, devices and computer equipment based on adversarial learning.
  • Existing content recommendation systems are generally based on manual feature extraction, collaborative filtering and decomposition techniques to achieve automatic recommendation.
  • the user By collecting user behavior data, system log data and other information, the user’s preferences and interests are modeled, and users are based on their preferences. Interests are clustered and grouped, and the same kind of content is recommended for users with similar preferences and interests.
  • the existing content recommendation system regards the collected user behavior data as statistical features, but cannot take into account the temporal logic of the development and change of user preferences and interests, and the recommended content does not have the automatic update function that keeps pace with the times. .
  • the main purpose of this application is to provide content recommendation based on adversarial learning, which aims to solve the technical problem that the temporal logic of the development and change of user preference and interest cannot be considered, and the recommended content does not have the technical problem of automatic updating with the times.
  • This application proposes a content recommendation method based on adversarial learning, including:
  • This application also provides a content recommendation device based on adversarial learning, including:
  • the obtaining module is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;
  • the modeling module is used to model the generator and the discriminator according to the weighted compression vector
  • the adversarial learning module is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
  • the first judgment module is used to judge whether the confrontation learning of the generator and the discriminator meets the preset condition
  • the determining module is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;
  • the recommendation module is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the computer program is executed.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
  • This application uses weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users and accurately recommend content information.
  • FIG. 1 is a schematic flowchart of a content recommendation method based on adversarial learning according to an embodiment of the present application
  • Fig. 2 is a schematic structural diagram of a content recommendation device based on adversarial learning according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
  • a content recommendation method based on adversarial learning includes:
  • the user characteristics constructed in the embodiments of the present application include user attribute characteristics P, historical click characteristics T, behavior cue characteristics Q, and user click behavior c.
  • User attribute features P include but are not limited to user profile information such as age and occupation; behavioral clue features Q include but are not limited to promoted information types, preferential policies, etc.; historical click features T include, but are not limited to, user historical personal information and user history Click content information; user click behavior c includes whether the assignment of the click behavior is true, if it is true, the click behavior occurred, otherwise it did not occur.
  • This application uses weighted compression to encode the time series features in the above user features to form a time series feature matrix, and uses the time series feature matrix and user attribute features to model the generator and the discriminator and fight against learning, so that the generated generation after learning
  • the device can identify the time sequence features in the user characteristics, obtain the user's interest preference feature carrying the time sequence change feature, and then recommend the content information according to the user's interest preference feature.
  • this application is more in line with the interest preferences of current users, and the recommended content is more accurate and targeted.
  • step S1 of obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics includes:
  • the user characteristics are coded according to the time sequence in the two-dimensional space of the time sequence dimension and the feature dimension to obtain the time sequence feature matrix corresponding to the user characteristics.
  • the above-mentioned user historical behavior characteristics are characteristic representations of historical data of user characteristics, and are a combination of user characteristics and historical time series characteristics.
  • This application processes the time series feature matrix through one-level weighted compression to obtain the embedding vector corresponding to the user's historical behavior feature.
  • the calculation process to obtain the first-level weighted compression of the embedding vector is as follows: Among them, St represents the embedding vector, and h represents according to the time series feature matrix
  • the operator to perform the operation vec represents the operator that pulls into a vector
  • represents the sigmoid function
  • W represents the feature weight matrix, that is, the aforementioned first compression weight matrix
  • B represents the feature bias vector, that is, the aforementioned first bias vector.
  • Two weighted compression process i.e.
  • the step S2 of modeling the generator and the discriminator according to the weighted compression vector includes:
  • S21 Perform vector splicing on user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector
  • the second stitching vector [P; T; Q] is obtained by performing vector stitching of user attribute characteristics, historical click features, and behavior cue features.
  • the sample training data is first constructed. The specific method is to splice the second splicing vector [P;T;Q] with the cpred output by the generator as the negative sample feature vector; use the second splicing vector [P;T;Q] is spliced with the user's real click c as a positive sample feature vector.
  • is a strategy model based on a multi-layer convolutional neural network
  • R( ⁇ ) is a regularization term
  • is a regularization parameter
  • r represents a discriminator with fixed parameters.
  • the parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.
  • step S21 of performing vector splicing of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain the second splicing vector the method includes:
  • v T represents the parameter of the reward function.
  • step S3 of adversarial learning is performed under the adversarial model by combining the modeled generator and the discriminator, including:
  • S31 splicing the second splicing vector with the modeling result of the generator to form a negative sample feature vector, splicing the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;
  • the formula of the confrontation model of this application is expressed as: Among them, ⁇ represents the optimized parameters of the discriminator in the adversarial learning, and ⁇ represents the parameters of the generator in the adversarial learning.
  • the learning goal of the generator is to generate a similar user click behavior cpred as much as possible according to the constructed vector of user characteristics, while the learning goal of the discriminator is to be able to distinguish the real user click behavior from the generator generation Similar to user click behavior.
  • the parameters of the discriminator and generator are alternately fixed. First fix the parameters of the discriminator, and train the generator through loss g . When the loss g drops, it means that the cpred generated by the generator successfully deceived the discriminator. Then fix the generator parameters and train the discriminator under the constraint of loss d . When loss d drops, it means that the discriminator has successfully distinguished between cpred and c.
  • the step S5 of determining the interest preference feature of the current user includes:
  • S51 Input the current user's historical information and designated marketing activity information into the generator after confrontation learning
  • the embodiments of this application are used for selecting marketing activity information as an example for detailed description.
  • the above-mentioned marketing information includes, but is not limited to, red envelopes, discount coupons, rebates, etc.
  • the method includes:
  • S61 Acquire a designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;
  • the user's historical characteristics and real click behavior are input into the discriminator, and the discriminator feedback output value is 1, indicating that it is a real click behavior.
  • the specified feature is time.
  • the feature data range includes the time span. Following the change of the time span, the change range of the discriminator output value also changes significantly, indicating that the user is sensitive to the time feature, and the time feature is determined as the user's sensitive feature.
  • a content recommendation device based on adversarial learning includes:
  • the obtaining module 1 is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;
  • Modeling module 2 used to model the generator and the discriminator according to the weighted compression vector
  • the adversarial learning module 3 is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
  • the first judging module 4 is used to judge whether the confrontation learning of the generator and the discriminator reaches the preset condition
  • the determination module 5 is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;
  • the recommendation module 6 is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
  • the user characteristics constructed in the embodiments of the present application include user attribute characteristics P, historical click characteristics T, behavior cue characteristics Q, and user click behavior c.
  • User attribute features P include but are not limited to user profile information such as age and occupation; behavioral clue features Q include but are not limited to promoted information types, preferential policies, etc.; historical click features T include, but are not limited to, user historical personal information and user history Click content information; user click behavior c includes whether the assignment of the click behavior is true, if it is true, the click behavior occurred, otherwise it did not occur.
  • This application uses weighted compression to encode the time series features in the above user features to form a time series feature matrix, and uses the time series feature matrix and user attribute features to model the generator and the discriminator and fight against learning, so that the generated generation after learning
  • the device can identify the time sequence features in the user characteristics, obtain the user's interest preference feature carrying the time sequence change feature, and then recommend the content information according to the user's interest preference feature.
  • this application is more in line with the interest preferences of current users, and the recommended content is more accurate and targeted.
  • module 1 is obtained, including:
  • the coding unit is used to code the user characteristics in a time series in the two-dimensional space of the time series dimension and the feature dimension to obtain a time series feature matrix corresponding to the user characteristics;
  • the first multiplication unit is configured to multiply the time series feature matrix and the first compression weight matrix to obtain the first product matrix after data compression;
  • the first correction unit is configured to obtain the first correction matrix after correcting the first product matrix by the first paranoia vector
  • the first input unit is configured to input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
  • the first splicing unit is used to splice the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector;
  • the second multiplication unit is configured to multiply the first splicing vector and the second compression weight matrix to obtain a second product matrix after data compression;
  • the second correction unit is used to correct the second product matrix through the second paranoia vector to obtain the weighted compression vector corresponding to the user's historical behavior feature.
  • the user characteristics are coded according to the time sequence in the two-dimensional space of the time sequence dimension and the feature dimension to obtain the time sequence feature matrix corresponding to the user characteristics.
  • the above-mentioned user historical behavior characteristics are characteristic representations of historical data of user characteristics, and are a combination of user characteristics and historical time series characteristics.
  • This application processes the time series feature matrix through one-level weighted compression to obtain the embedding vector corresponding to the user's historical behavior feature.
  • the calculation process to obtain the first-level weighted compression of the embedding vector is as follows: Among them, St represents the embedding vector, and h represents according to the time series feature matrix
  • the operator to perform the operation vec represents the operator that pulls into a vector
  • represents the sigmoid function
  • W represents the feature weight matrix, that is, the aforementioned first compression weight matrix
  • B represents the feature bias vector, that is, the aforementioned first bias vector.
  • Two weighted compression process i.e.
  • Modeling module 2 includes:
  • the second splicing unit is used to perform vector splicing of user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second splicing vector;
  • the first modeling unit is used to input the second stitching vector into the model of the generator under the fixed model parameters of the discriminator, and model the model of the generator through the first cross-entropy loss function constraint;
  • the first judging unit is used to judge whether the first cross-entropy loss function reaches the minimum value
  • Get unit is used to get the model of the generator if it reaches the minimum value.
  • the second stitching vector [P; T; Q] is obtained by performing vector stitching of user attribute characteristics, historical click features, and behavior cue features.
  • the sample training data is first constructed. The specific method is to splice the second splicing vector [P;T;Q] with the cpred output by the generator as the negative sample feature vector; use the second splicing vector [P;T;Q] is spliced with the user's real click c as a positive sample feature vector.
  • is a strategy model based on a multilayer convolutional neural network
  • R( ⁇ ) is a regularization term
  • is a regularization parameter
  • r represents a discriminator with fixed parameters.
  • the parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.
  • the second splicing unit includes:
  • the input subunit is used to input the weighted compression vector into the sigmoid function to obtain the output result of the weighted compression vector;
  • v T represents the parameter of the reward function.
  • the confrontation learning module 3 includes:
  • the third splicing unit is used to splice the second splicing vector with the modeling result of the generator to form a negative sample feature vector, and splice the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;
  • the second modeling unit is used to input the negative sample feature vector and the positive sample feature vector into the discriminator, fix the generator parameters, and model the discriminator under the constraints of the second cross-entropy loss function;
  • the second judging unit is used to judge whether the second cross-entropy loss function reaches the minimum value
  • the determination unit is used to determine the parameters of the discriminator if the minimum value is reached;
  • the confrontation learning unit is used for learning the generator and the discriminator through the confrontation model according to the modeling process of the generator and the discriminator, until the first cross entropy loss function and the second cross entropy loss function both reach the minimum.
  • the formula of the confrontation model of this application is expressed as: Among them, ⁇ represents the optimized parameters of the discriminator in the adversarial learning, and ⁇ represents the parameters of the generator in the adversarial learning.
  • the learning goal of the generator is to generate a similar user click behavior cpred as much as possible according to the constructed vector of user characteristics, while the learning goal of the discriminator is to be able to distinguish the real user click behavior from the generator generation Similar to user click behavior.
  • the parameters of the discriminator and generator are alternately fixed. First fix the parameters of the discriminator, and train the generator through loss g . When the loss g drops, it means that the cpred generated by the generator successfully deceived the discriminator. Then fix the generator parameters and train the discriminator under the constraint of loss d . When loss d drops, it means that the discriminator has successfully distinguished between cpred and c.
  • the determining module 5 includes:
  • the second input unit is used for inputting the current user's history information and designated marketing activity information into the generator after confrontation learning;
  • the third judgment unit is used to judge whether the feedback value of the discriminator after confrontation learning is equal to 1;
  • the determining unit is used for determining that the specified marketing activity information belongs to the current user's interest preference feature if it is equal to 1.
  • the embodiments of this application are used for selecting marketing activity information as an example for detailed description.
  • the above-mentioned marketing information includes, but is not limited to, red envelopes, discount coupons, rebates, etc.
  • the content recommendation device based on adversarial learning includes:
  • the first acquisition module is used to acquire the designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;
  • the change module is used to change the range of feature data when the specified feature is input to the discriminator;
  • the second acquisition module is used to acquire the output value change range that follows the corresponding change of the characteristic data range
  • the second judgment module is used to judge whether the change range of the output value exceeds the preset range
  • the judging module is used for judging that the specified feature is a sensitive feature that affects the user's click action if it exceeds the preset range.
  • the user's historical characteristics and real click behavior are input into the discriminator, and the discriminator feedback output value is 1, indicating that it is a real click behavior.
  • the specified feature is time.
  • the feature data range includes the time span. Following the change of the time span, the change range of the discriminator output value also changes significantly, indicating that the user is sensitive to the time feature, and the time feature is determined as the user's sensitive feature.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used to store all the data required for the content recommendation process based on adversarial learning.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize the content recommendation method based on adversarial learning.
  • the processor executes the content recommendation method based on adversarial learning, including: obtaining a weighted compression vector corresponding to the user's historical behavior feature through weighted compression of pre-built user characteristics; modeling the generator and the discriminator according to the weighted compression vector; The modeled generator is combined with the discriminator to conduct adversarial learning under the adversarial model; judge whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, input the current user's historical information into the adversarial learning generator In the process, the feedback value of the discriminator after adversarial learning is combined to determine the current user's interest preference feature; according to the current user's interest preference feature, content information matching the current user's interest preference feature is recommended to the current user.
  • adversarial learning including: obtaining a weighted compression vector corresponding to the user's historical behavior feature through weighted compression of pre-built user characteristics; modeling the generator and the discriminator according to the weighted compression vector; The modeled generator is combined with the discriminator to conduct adversari
  • the above-mentioned computer equipment models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain online users' interest and preference characteristics, and accurately recommend content information.
  • the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension. Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained.
  • a correction matrix input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.
  • the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics.
  • the step of modeling the generator and the discriminator according to the weighted compression vector by the above-mentioned processor includes: combining the user attribute characteristics and the historical click characteristics Perform vector stitching with behavioral clues to obtain the second stitching vector; under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is constrained by the first cross-entropy loss function. Perform modeling; determine whether the first cross-entropy loss function reaches the minimum value; if so, obtain the generator model.
  • the processor before the step of performing vector splicing of the user attribute characteristics, historical click characteristics, and behavior cue characteristics, to obtain the second splicing vector, the processor includes: inputting the weighted compression vector into the sigmoid function to obtain the weighted compression vector Output result; multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value; use the calculation method of the reward value as the model of the discriminator.
  • the above-mentioned processor combines the modeled generator with the discriminator, and performs the step of adversarial learning under the adversarial model, including: splicing the second splicing vector with the modeling result of the generator to form a negative sample
  • the feature vector, the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into the positive sample feature vector; the negative sample feature vector and the positive sample feature vector are input to the discriminator, the generator parameters are fixed, and the second cross-entropy loss Model the discriminator under the constraints of the function; judge whether the second cross-entropy loss function reaches the minimum value; if so, determine the parameters of the discriminator; according to the modeling process of the generator and the discriminator, the generator and the discriminator are combined through the confrontation model
  • the discriminator fights against learning until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum.
  • the above-mentioned processor inputs the historical information of the current user into the generator after the confrontation learning, and combines the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user.
  • the historical information and designated marketing activity information are input into the generator after confrontation learning; it is judged whether the feedback value of the discriminator after confrontation learning is equal to 1; if so, it is determined that the designated marketing activity information belongs to the current user’s interest preference feature.
  • the step includes: acquiring a specified feature that affects the user’s click action, where the specified A feature is any one of all the features that affect the user's click action; change the feature data range when the specified feature is input to the discriminator; obtain the output value change range that follows the corresponding change of the feature data range; determine whether the output value change range exceeds the preset range; if so , It is determined that the specified feature is a sensitive feature that affects the user's click action.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon.
  • the The learning content recommendation method includes: obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance; modeling the generator and the discriminator according to the weighted compression vector; and combining the modeled generator with The discriminator is combined to conduct adversarial learning under the adversarial model; it is judged whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, the current user’s historical information is input into the adversarial learning generator, combined with the adversarial learning
  • the feedback value of the discriminator determines the interest preference feature of the current user; according to the interest preference feature of the current user, it is recommended to the current user content information that matches the interest preference feature of the current user.
  • the above-mentioned computer-readable storage medium models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics changing with time series, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users, and Accurately recommend content information.
  • the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension.
  • Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained A correction matrix; input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.
  • the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics.
  • the step of modeling the generator and the discriminator according to the weighted compression vector by the above-mentioned processor includes: combining the user attribute characteristics and the historical click characteristics Perform vector stitching with behavioral clues to obtain the second stitching vector; under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is constrained by the first cross-entropy loss function. Perform modeling; determine whether the first cross-entropy loss function reaches the minimum value; if so, obtain the generator model.
  • the processor before the step of performing vector splicing of the user attribute characteristics, historical click characteristics, and behavior cue characteristics, to obtain the second splicing vector, the processor includes: inputting the weighted compression vector into the sigmoid function to obtain the weighted compression vector Output result; multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value; use the calculation method of the reward value as the model of the discriminator.
  • the above-mentioned processor combines the modeled generator with the discriminator, and performs the step of adversarial learning under the adversarial model, including: splicing the second splicing vector with the modeling result of the generator to form a negative sample
  • the feature vector, the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into the positive sample feature vector; the negative sample feature vector and the positive sample feature vector are input to the discriminator, the generator parameters are fixed, and the second cross-entropy loss Model the discriminator under the constraints of the function; judge whether the second cross-entropy loss function reaches the minimum value; if so, determine the parameters of the discriminator; according to the modeling process of the generator and the discriminator, the generator and the discriminator are combined through the confrontation model
  • the discriminator fights against learning until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum.
  • the above-mentioned processor inputs the historical information of the current user into the generator after the confrontation learning, and combines the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user.
  • the historical information and designated marketing activity information are input into the generator after confrontation learning; it is judged whether the feedback value of the discriminator after confrontation learning is equal to 1; if so, it is determined that the designated marketing activity information belongs to the current user’s interest preference feature.
  • the step includes: acquiring a specified feature that affects the user’s click action, where the specified A feature is any one of all the features that affect the user's click action; change the feature data range when the specified feature is input to the discriminator; obtain the output value change range that follows the corresponding change of the feature data range; determine whether the output value change range exceeds the preset range; if so , It is determined that the specified feature is a sensitive feature that affects the user's click action.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

Provided is a content recommendation method based on adversarial learning. The method relates to the field of artificial intelligence, and comprises: obtaining, by means of weighted compression of a pre-constructed user feature, a weighted compression vector corresponding to a historical behavior feature of a user (S1); modeling a generator and a discriminator according to the weighted compression vector (S2); combining the generator and the discriminator that have been subjected to modeling, and performing adversarial learning under an adversarial model (S3); determining whether the adversarial learning of the generator and the discriminator reaches a pre-set condition (S4); if so, inputting historical information of the current user into the generator after adversarial learning, and determining interest preference features of the current user in combination with a feedback value of the discriminator after adversarial learning (S5); and according to the interest preference features of the current user, recommending, to the current user, content information matching the interest preference features of the current user (S6). Behavior features are modeled by means of weighted compression, time sequence change features of user behavior features are captured, and a generator can acquire interest preference features on the basis of adversarial learning, so as to accurately recommend content information.

Description

基于对抗学习的内容推荐方法、装置和计算机设备Content recommendation method, device and computer equipment based on confrontation learning
本申请要求于_2020_年9月28日提交中国专利局、申请号为2020110449667,发明名称为“基于对抗学习的内容推荐方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 28, 2020, with the application number 2020110449667, and the invention title "Methods, devices and computer equipment for content recommendation based on adversarial learning", and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能领域,特别是涉及到基于对抗学习的内容推荐方法、装置和计算机设备。This application relates to the field of artificial intelligence, in particular to content recommendation methods, devices and computer equipment based on adversarial learning.
背景技术Background technique
现有的内容推荐系统,一般是基于人工特征提取,协同过滤分解等技术来实现自动化推荐,通过采集用户行为数据、系统日志数据等信息,对用户的偏好兴趣进行建模,并将用户根据偏好兴趣进行聚类分组,为具有相似偏好兴趣的用户推荐同一种内容。但发明人意识到现有的内容推荐系统,将采集到的用户行为数据视作是统计特征,却无法考虑到用户偏好兴趣发展变化的时序逻辑,推荐内容不具有与时俱进的自动更新功能。Existing content recommendation systems are generally based on manual feature extraction, collaborative filtering and decomposition techniques to achieve automatic recommendation. By collecting user behavior data, system log data and other information, the user’s preferences and interests are modeled, and users are based on their preferences. Interests are clustered and grouped, and the same kind of content is recommended for users with similar preferences and interests. However, the inventor realizes that the existing content recommendation system regards the collected user behavior data as statistical features, but cannot take into account the temporal logic of the development and change of user preferences and interests, and the recommended content does not have the automatic update function that keeps pace with the times. .
技术问题technical problem
本申请的主要目的为提供基于对抗学习的内容推荐,旨在解决现无法考虑到用户偏好兴趣发展变化的时序逻辑,推荐内容不具有与时俱进的自动更新的技术问题。The main purpose of this application is to provide content recommendation based on adversarial learning, which aims to solve the technical problem that the temporal logic of the development and change of user preference and interest cannot be considered, and the recommended content does not have the technical problem of automatic updating with the times.
技术解决方案Technical solutions
本申请提出一种基于对抗学习的内容推荐方法,包括:This application proposes a content recommendation method based on adversarial learning, including:
通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;
根据加权压缩向量对生成器和判别器进行建模;Model the generator and discriminator based on the weighted compression vector;
将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;Combine the modeled generator and discriminator to conduct adversarial learning under the adversarial model;
判断生成器和判别器的对抗学习是否达到预设条件;Determine whether the adversarial learning of the generator and the discriminator meets the preset conditions;
若是,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;If yes, input the historical information of the current user into the generator after adversarial learning, and combine the feedback value of the discriminator after adversarial learning to determine the interest preference feature of the current user;
根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
本申请还提供了一种基于对抗学习的内容推荐装置,包括:This application also provides a content recommendation device based on adversarial learning, including:
得到模块,用于通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;The obtaining module is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;
建模模块,用于根据加权压缩向量对生成器和判别器进行建模;The modeling module is used to model the generator and the discriminator according to the weighted compression vector;
对抗学习模块,用于将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;The adversarial learning module is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
第一判断模块,用于判断生成器和判别器的对抗学习是否达到预设条件;The first judgment module is used to judge whether the confrontation learning of the generator and the discriminator meets the preset condition;
确定模块,用于若达到预设条件,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;The determining module is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;
推荐模块,用于根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。The recommendation module is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
本申请还提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机程序,处理器执行计算机程序时实现上述方法的步骤。The present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method when the computer program is executed.
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述的方法的步骤。The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above method are realized.
有益效果Beneficial effect
本申请通过加权压缩对用户的历史行为特征进行建模,以捕获用户的历史行为特征跟随时序变化的特征,并基于对抗学习使得生成器可获取在线用户的兴趣偏好特征,并精准推荐内容信息。This application uses weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users and accurately recommend content information.
附图说明Description of the drawings
图1本申请一实施例的基于对抗学习的内容推荐方法流程示意图;FIG. 1 is a schematic flowchart of a content recommendation method based on adversarial learning according to an embodiment of the present application;
图2本申请一实施例的基于对抗学习的内容推荐装置结构示意图;Fig. 2 is a schematic structural diagram of a content recommendation device based on adversarial learning according to an embodiment of the present application;
图3本申请一实施例的计算机设备内部结构示意图。Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
本申请最佳的实施方式The best way to implement this application
参照图1,本申请一实施例的基于对抗学习的内容推荐方法,包括:1, a content recommendation method based on adversarial learning according to an embodiment of the present application includes:
S1:通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;S1: Obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the pre-built user characteristics;
S2:根据加权压缩向量对生成器和判别器进行建模;S2: Model the generator and the discriminator according to the weighted compression vector;
S3:将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;S3: Combine the modeled generator with the discriminator, and conduct adversarial learning under the adversarial model;
S4:判断生成器和判别器的对抗学习是否达到预设条件;S4: Determine whether the adversarial learning of the generator and the discriminator meets the preset conditions;
S5:若是,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;S5: If yes, input the historical information of the current user into the generator after adversarial learning, and combine the feedback value of the discriminator after adversarial learning to determine the interest preference feature of the current user;
S6:根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。S6: According to the interest preference feature of the current user, recommend to the current user content information that matches the interest preference feature of the current user.
本申请实施例构造的用户特征包括用户属性特征P、历史点击特征T、行为线索特征Q以及用户点击行为c。用户属性特征P包括但不限于用户的年龄、职业等用户画像信息;行为线索特征Q包括但不限于推广的信息种类、优惠策略等;历史点击特征T包括但不限于用户历史个人信息以及用户历史点击的内容信息;用户点击行为c包括点击行为的赋值是否为真,为真即发生了点击行为,否则未发生。本申请通过加权压缩将上述用户特征中的时序特征进行编码,形成时序特征矩阵,并通过时序特征矩阵和用户属性特征一并对生成器以及判别器进行建模以及对抗学习,使学习后的生成器能识别用户特征中的时序特征,得到携带时序变化特征的用户的兴趣偏好特征,然后根据用户的兴趣偏好特征进行内容信息的推荐。本申请相比于现有直接根据静态的历史数据进行内容信息推荐,更贴合当前用户的兴趣偏好,推荐内容更精准、更有针对性。The user characteristics constructed in the embodiments of the present application include user attribute characteristics P, historical click characteristics T, behavior cue characteristics Q, and user click behavior c. User attribute features P include but are not limited to user profile information such as age and occupation; behavioral clue features Q include but are not limited to promoted information types, preferential policies, etc.; historical click features T include, but are not limited to, user historical personal information and user history Click content information; user click behavior c includes whether the assignment of the click behavior is true, if it is true, the click behavior occurred, otherwise it did not occur. This application uses weighted compression to encode the time series features in the above user features to form a time series feature matrix, and uses the time series feature matrix and user attribute features to model the generator and the discriminator and fight against learning, so that the generated generation after learning The device can identify the time sequence features in the user characteristics, obtain the user's interest preference feature carrying the time sequence change feature, and then recommend the content information according to the user's interest preference feature. Compared with the existing content information recommendation directly based on static historical data, this application is more in line with the interest preferences of current users, and the recommended content is more accurate and targeted.
进一步地,通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤S1,包括:Further, the step S1 of obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics includes:
S11:在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵;S11: In the two-dimensional space of the time series dimension and the feature dimension, the user features are coded according to the time series to obtain the time series feature matrix corresponding to the user features;
S12:将时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;S12: Multiply the time series feature matrix and the first compression weight matrix to obtain the first product matrix after data compression;
S13:将第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;S13: After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;
S14:将第一矫正矩阵输入sigmoid函数,得到用户历史行为特征对应的嵌入向量;S14: Input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
S15:将用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;S15: Join the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first stitching vector;
S16:将第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;S16: Multiply the first splicing vector and the second compression weight matrix to obtain a second product matrix after data compression;
S17:将第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。S17: After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
本申请实施例通过在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵。上述用户历史行为特征是对用户特征的历史数据的特征表示,是用户特征和历史时序特征的综合。本申请通过一级加权压缩对时序特征矩阵进行处理,得到用户历史行为特征对应的嵌入向量。得到嵌入向量的一级加权压缩的计算过程如下:
Figure PCTCN2020132592-appb-000001
其中,S t表示嵌入向量,h表示根据时序特征矩阵
Figure PCTCN2020132592-appb-000002
进行运算的运算符,vec表示拉成向量的运算符,σ表示sigmoid函数,W表示特征权重矩阵,即上述的第一压缩权重矩阵,B表示特征偏执向量,即上述的第一偏执向量。二级加权压缩的过程,即将嵌入向量S t与指定时刻t时的时序特征f t a拼接后,与压缩权重矩阵V相乘,再加上压缩偏执向量b,得到加权压缩向量
Figure PCTCN2020132592-appb-000003
本申请通过二级加权压缩对用户的历史行为特征进行建模,以捕获用户的历史行为特征跟随时序变化的特征,模拟兴趣偏好虽时间变化的趋势,及时跟随兴趣偏好的偏离改变,更新内容信息的推荐策略。本申请的“第一”、“第二”等用语用于区别,不用于限定,其他类似语作用相同,不赘述。
In the embodiment of the present application, the user characteristics are coded according to the time sequence in the two-dimensional space of the time sequence dimension and the feature dimension to obtain the time sequence feature matrix corresponding to the user characteristics. The above-mentioned user historical behavior characteristics are characteristic representations of historical data of user characteristics, and are a combination of user characteristics and historical time series characteristics. This application processes the time series feature matrix through one-level weighted compression to obtain the embedding vector corresponding to the user's historical behavior feature. The calculation process to obtain the first-level weighted compression of the embedding vector is as follows:
Figure PCTCN2020132592-appb-000001
Among them, St represents the embedding vector, and h represents according to the time series feature matrix
Figure PCTCN2020132592-appb-000002
The operator to perform the operation, vec represents the operator that pulls into a vector, σ represents the sigmoid function, W represents the feature weight matrix, that is, the aforementioned first compression weight matrix, and B represents the feature bias vector, that is, the aforementioned first bias vector. Two weighted compression process, i.e. embedded vector S t and wherein when the designated time sequence after a splice t f t, multiplied by the compression weight matrix V, paranoid vector B plus compression, compressed to give weighting vector
Figure PCTCN2020132592-appb-000003
This application uses two-level weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics following time series changes, simulate the trend of interest preferences despite time changes, and follow the deviation changes of interest preferences in time, and update content information Recommended strategy. The terms "first" and "second" in this application are used for distinction and not for limitation. Other similar terms have the same function and will not be repeated.
进一步地,用户特征包括用户属性特征、历史点击特征和行为线索特征,根据加权压缩向量对生成器和判别器进行建模的步骤S2,包括:Further, the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. The step S2 of modeling the generator and the discriminator according to the weighted compression vector includes:
S21:将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;S21: Perform vector splicing on user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;
S22:在固定判别器的模型参数下,将第二拼接向量输入生成器的模型中,通过第一交叉熵损失函数约束,对生成器的模型进行建模;S22: Under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;
S24:判断第一交叉熵损失函数是否达到最小值;S24: Determine whether the first cross-entropy loss function reaches the minimum value;
S25:若是,则得到生成器的模型。S25: If yes, get the generator model.
本申请实施例中,将用户属性特征、历史点击特征和行为线索特征,进行向量拼接得到第二拼接向量[P;T;Q]。本申请中对判别器进行建模时,首先构造样本训练数据,具体方法是将第二拼接向量[P;T;Q]与生成器输出的cpred拼接作为负样本特征向量;将第二拼接向量[P;T;Q]与用户真实的点击c拼接后作为正样本特征向量。本申请的生成器的模型公式如下:
Figure PCTCN2020132592-appb-000004
其中,φ为基于多层卷积神经网络的策略模型,R(φ)为正则化项,η为正则化参数,r表示固定参量的判别器。生成器对输入第二拼接向量[P;T;Q]时的输出结果表示为cpred=MultiConv([P;T;Q]),上述第一交叉熵损失函数表示为:lossg=CrossEntropy(cpred,c),即表示cpred与c之间的损失度量。本申请的多层卷积神经网络的参数经过Adam算法的优化。
In the embodiment of the present application, the second stitching vector [P; T; Q] is obtained by performing vector stitching of user attribute characteristics, historical click features, and behavior cue features. When modeling the discriminator in this application, the sample training data is first constructed. The specific method is to splice the second splicing vector [P;T;Q] with the cpred output by the generator as the negative sample feature vector; use the second splicing vector [P;T;Q] is spliced with the user's real click c as a positive sample feature vector. The model formula of the generator of this application is as follows:
Figure PCTCN2020132592-appb-000004
Among them, φ is a strategy model based on a multi-layer convolutional neural network, R(φ) is a regularization term, η is a regularization parameter, and r represents a discriminator with fixed parameters. The output of the generator when the second splicing vector [P;T;Q] is input is expressed as cpred=MultiConv([P;T;Q]), and the above-mentioned first cross-entropy loss function is expressed as: lossg=CrossEntropy(cpred, c), which means the loss metric between cpred and c. The parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.
进一步地,将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤S21之前,包括:Further, before step S21 of performing vector splicing of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain the second splicing vector, the method includes:
S201:将加权压缩向量输入sigmoid函数,得到加权压缩向量的输出结果;S201: Input the weighted compression vector into the sigmoid function to obtain the output result of the weighted compression vector;
S202:将加权压缩向量的输出结果,乘以回报函数参数,得到回报值;S202: Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;
S203:将回报值的计算方式作为判别器的模型。S203: Use the calculation method of the reward value as a model of the discriminator.
本申请判别器的模型的公式为:
Figure PCTCN2020132592-appb-000005
v T表示回报函数参数。
The formula of the discriminator model of this application is:
Figure PCTCN2020132592-appb-000005
v T represents the parameter of the reward function.
进一步地,将建模后的生成器与判别器联合,在对抗模型下进行对抗学习的步骤S3,包括:Further, the step S3 of adversarial learning is performed under the adversarial model by combining the modeled generator and the discriminator, including:
S31:将第二拼接向量与生成器的建模结果拼接,形成负样本特征向量,将第二拼接向量与第二拼接向量对应的用户点击真实值拼接为正样本特征向量;S31: splicing the second splicing vector with the modeling result of the generator to form a negative sample feature vector, splicing the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;
S32:将负样本特征向量和正样本特征向量输入判别器,固定生成器参数,在第二交叉熵损失函数的约束下对判别器进行建模;S32: Input the negative sample feature vector and the positive sample feature vector into the discriminator, fix the generator parameters, and model the discriminator under the constraints of the second cross-entropy loss function;
S33:判断第二交叉熵损失函数是否达到最小值;S33: Determine whether the second cross-entropy loss function reaches the minimum value;
S34:若是,则确定判别器的参数;S34: If yes, determine the parameters of the discriminator;
S35:根据生成器和判别器的建模过程,通过对抗模型将生成器和判别器对抗学习,至第一交叉熵损失函数和第二交叉熵损失函数均达到最小值。S35: According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through the confrontation model until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum value.
本申请的第二交叉熵损失函数由两部分组成,一部分对应于生成器对第二拼接向量的输出约束,另一部分对应于对真实点击动作的输出约束,即loss d=loss 1+loss 2,loss 1=CrossEntropy(0,MultiConv([P;T;Q;cpred]));loss 2=CrossEntropy(1,MultiConv([P;T;Q;c]))。本申请的对抗模型的公式表示为:
Figure PCTCN2020132592-appb-000006
其中,θ表示对抗学习中判别器的优化参数,α表示对抗学习中生成器的参数。本申请的对 抗学习中,生成器的学习目标为根据构造的用户特征的向量,尽可能生成类似用户点击行为cpred,而判别器的学习目标则是能够区分出真实的用户点击行为与生成器生成的类似用户点击行为。对抗学习中判别器和生成器的参数交替固定。先固定判别器的参数,通过loss g训练生成器,当loss g下降时,说明生成器生成的cpred成功欺骗了判别器。然后固定生成器参数,在loss d约束下训练判别器,当loss d下降时,说明判别器又成功区分了cpred和c。交替训练学习,直至loss d和loss g均小于预设阈值,达到最小值。此时的生成器能考虑用户的历史点击信息,并尽可能模仿用户点击动作的决策,而判别器能够模拟用户点击动作的反馈。
The second cross-entropy loss function of this application consists of two parts, one part corresponds to the output constraint of the generator to the second stitching vector, and the other part corresponds to the output constraint to the real click action, namely loss d = loss 1 + loss 2 , loss 1 = CrossEntropy(0, MultiConv([P; T; Q; cpred])); loss 2 = CrossEntropy(1, MultiConv([P; T; Q; c])). The formula of the confrontation model of this application is expressed as:
Figure PCTCN2020132592-appb-000006
Among them, θ represents the optimized parameters of the discriminator in the adversarial learning, and α represents the parameters of the generator in the adversarial learning. In the adversarial learning of this application, the learning goal of the generator is to generate a similar user click behavior cpred as much as possible according to the constructed vector of user characteristics, while the learning goal of the discriminator is to be able to distinguish the real user click behavior from the generator generation Similar to user click behavior. In the adversarial learning, the parameters of the discriminator and generator are alternately fixed. First fix the parameters of the discriminator, and train the generator through loss g . When the loss g drops, it means that the cpred generated by the generator successfully deceived the discriminator. Then fix the generator parameters and train the discriminator under the constraint of loss d . When loss d drops, it means that the discriminator has successfully distinguished between cpred and c. Alternate training and learning until loss d and loss g are both smaller than the preset threshold and reach the minimum value. The generator at this time can consider the user's historical click information and imitate the user's click action decision as much as possible, while the discriminator can simulate the feedback of the user's click action.
进一步地,将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征的步骤S5,包括:Further, inputting the historical information of the current user into the generator after adversarial learning, and combining the feedback value of the discriminator after adversarial learning, the step S5 of determining the interest preference feature of the current user includes:
S51:将当前用户的历史信息和指定营销活动信息输入对抗学习后的生成器中;S51: Input the current user's historical information and designated marketing activity information into the generator after confrontation learning;
S52:判断对抗学习后的判别器的反馈值是否等于1;S52: Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;
S53:若是,则判定指定营销活动信息属于当前用户的兴趣偏好特征。S53: If yes, determine that the specified marketing activity information belongs to the current user's interest preference feature.
本申请实施例以用于选择营销活动信息为例,进行详细说明。上述营销活动信息包括但不限于发红包、发优惠劵、返回扣等,通过将不同的营销活动信息对于特征向量与当前用户的历史信息对应的向量输入生成器,由生成器模拟用户在不同营销活动信息的点击行为,并通过判别器反馈的区别值的大小,确定用户对不同营销活动信息的兴趣偏好。The embodiments of this application are used for selecting marketing activity information as an example for detailed description. The above-mentioned marketing information includes, but is not limited to, red envelopes, discount coupons, rebates, etc. By inputting different marketing activities information to the vector corresponding to the characteristic vector of the current user’s historical information into the generator, the generator simulates the user’s different marketing activities. The click behavior of activity information, and the size of the difference value fed back by the discriminator, determine the user's interest and preference for different marketing activity information.
进一步地,根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息的步骤S6之后,包括:Further, after the step S6 of recommending to the current user content information matching the current user's interest preference feature according to the current user's interest preference feature, the method includes:
S61:获取影响用户点击动作的指定特征,其中,指定特征为影响用户点击动作的所有特征中任意一个;S61: Acquire a designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;
S62:改变指定特征输入判别器时的特征数据范围;S62: Change the range of feature data when the specified feature is input to the discriminator;
S63:获取跟随特征数据范围对应变化的输出值变化范围;S63: Obtain the change range of the output value following the corresponding change of the characteristic data range;
S64:判断输出值变化范围是否超过预设范围;S64: Determine whether the change range of the output value exceeds the preset range;
S65:若是,则判定指定特征为影响用户点击动作的敏感特征。S65: If yes, determine that the specified feature is a sensitive feature that affects the user's click action.
本申请实施例通过将用户历史特征和真实点击行为输入判别器,判别器反馈输出值为1,则说明是真实的点击行为。上述指定特征比如为时间,特征数据范围包括时间跨度,跟随时间跨度的变化,判别器输出值改变范围也大幅改变,说明用户对时间特征是敏感的,则确定时间特征为用户的敏感特征,可通过敏感特征对用户形成持续发展的画像,以便实时更新用户分类和聚群。In the embodiment of the present application, the user's historical characteristics and real click behavior are input into the discriminator, and the discriminator feedback output value is 1, indicating that it is a real click behavior. For example, the specified feature is time. The feature data range includes the time span. Following the change of the time span, the change range of the discriminator output value also changes significantly, indicating that the user is sensitive to the time feature, and the time feature is determined as the user's sensitive feature. Form a continuously developing portrait of users through sensitive features, so as to update user classification and clustering in real time.
参照图2,本申请一实施例的基于对抗学习的内容推荐装置,包括:2, a content recommendation device based on adversarial learning according to an embodiment of the present application includes:
得到模块1,用于通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;The obtaining module 1 is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;
建模模块2,用于根据加权压缩向量对生成器和判别器进行建模; Modeling module 2, used to model the generator and the discriminator according to the weighted compression vector;
对抗学习模块3,用于将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;The adversarial learning module 3 is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
第一判断模块4,用于判断生成器和判别器的对抗学习是否达到预设条件;The first judging module 4 is used to judge whether the confrontation learning of the generator and the discriminator reaches the preset condition;
确定模块5,用于若达到预设条件,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;The determination module 5 is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user;
推荐模块6,用于根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。The recommendation module 6 is used to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
本申请实施例构造的用户特征包括用户属性特征P、历史点击特征T、行为线索特征Q以及用户点击行为c。用户属性特征P包括但不限于用户的年龄、职业等用户画像信息;行为线索特征Q包括但不限于推广的信息种类、优惠策略等;历史点击特征T包括但不限于用户历史个人信息以及用户历史点击的内容信息;用户点击行为c包括点击行为的赋值是否为真,为真即发生了点击行为,否则未发生。本申请通过加权压缩将上述用户特征中的时序特征进行编码,形成时序特征矩阵,并通过时序特征矩阵和用户属性特征一并对生成器以及判别器进行建模以及对抗学习,使学习后的生成器能识别用户特征中的时序特征,得到携带时序变化特征的用户的兴趣偏好特征,然后根据用户的兴趣偏好特征进行内容信息的推荐。本申请相比于现有直接根据静态的历史数据进行内容信息推荐,更贴合当前用户的兴趣偏好,推荐内容更精准、更有针对性。The user characteristics constructed in the embodiments of the present application include user attribute characteristics P, historical click characteristics T, behavior cue characteristics Q, and user click behavior c. User attribute features P include but are not limited to user profile information such as age and occupation; behavioral clue features Q include but are not limited to promoted information types, preferential policies, etc.; historical click features T include, but are not limited to, user historical personal information and user history Click content information; user click behavior c includes whether the assignment of the click behavior is true, if it is true, the click behavior occurred, otherwise it did not occur. This application uses weighted compression to encode the time series features in the above user features to form a time series feature matrix, and uses the time series feature matrix and user attribute features to model the generator and the discriminator and fight against learning, so that the generated generation after learning The device can identify the time sequence features in the user characteristics, obtain the user's interest preference feature carrying the time sequence change feature, and then recommend the content information according to the user's interest preference feature. Compared with the existing content information recommendation directly based on static historical data, this application is more in line with the interest preferences of current users, and the recommended content is more accurate and targeted.
进一步地,得到模块1,包括:Further, module 1 is obtained, including:
编码单元,用于在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵;The coding unit is used to code the user characteristics in a time series in the two-dimensional space of the time series dimension and the feature dimension to obtain a time series feature matrix corresponding to the user characteristics;
第一相乘单元,用于将时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;The first multiplication unit is configured to multiply the time series feature matrix and the first compression weight matrix to obtain the first product matrix after data compression;
第一矫正单元,用于将第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;The first correction unit is configured to obtain the first correction matrix after correcting the first product matrix by the first paranoia vector;
第一输入单元,用于将第一矫正矩阵输入sigmoid函数,得到用户历史行为特征对应的嵌入向量;The first input unit is configured to input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
第一拼接单元,用于将用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;The first splicing unit is used to splice the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector;
第二相乘单元,用于将第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;The second multiplication unit is configured to multiply the first splicing vector and the second compression weight matrix to obtain a second product matrix after data compression;
第二矫正单元,用于将第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。The second correction unit is used to correct the second product matrix through the second paranoia vector to obtain the weighted compression vector corresponding to the user's historical behavior feature.
本申请实施例通过在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵。上述用户历史行为特征是对用户特征的历史数据的特征表示,是用户特征和历史时序特征的综合。本申请通过一级加权压缩对时序特征矩阵进行处理,得到用户历史行为特征对应的嵌入向量。得到嵌入向量的一级加权压缩的计算过程如下:
Figure PCTCN2020132592-appb-000007
其中,S t表示嵌入向量,h表示根据时序特征矩阵
Figure PCTCN2020132592-appb-000008
进行运算的运算符,vec表示拉成向量的运算符,σ表示sigmoid函数,W表示特征权重矩阵,即上述的第一压缩权重矩阵,B表示特征偏执向量,即上述的第一偏执向量。二级加权压缩的过程,即将嵌入向量S t与指定时刻t时的时序特征f t a拼接后,与压缩权重矩阵V相乘,再加上压缩偏执向量b,得到加权压缩向量
Figure PCTCN2020132592-appb-000009
本申请通过二级加权压缩对用户的历史行为特征进行建模,以捕获用户的历史行为特征跟随时序变化的特征,模拟兴趣偏好虽时间变化的趋势,及时跟随兴趣偏好的偏离改变,更新内容信息的推荐策略。本申请的“第一”、“第二”等用语用于区别,不用于限定,其他类似语作用相同,不赘述。
In the embodiment of the present application, the user characteristics are coded according to the time sequence in the two-dimensional space of the time sequence dimension and the feature dimension to obtain the time sequence feature matrix corresponding to the user characteristics. The above-mentioned user historical behavior characteristics are characteristic representations of historical data of user characteristics, and are a combination of user characteristics and historical time series characteristics. This application processes the time series feature matrix through one-level weighted compression to obtain the embedding vector corresponding to the user's historical behavior feature. The calculation process to obtain the first-level weighted compression of the embedding vector is as follows:
Figure PCTCN2020132592-appb-000007
Among them, St represents the embedding vector, and h represents according to the time series feature matrix
Figure PCTCN2020132592-appb-000008
The operator to perform the operation, vec represents the operator that pulls into a vector, σ represents the sigmoid function, W represents the feature weight matrix, that is, the aforementioned first compression weight matrix, and B represents the feature bias vector, that is, the aforementioned first bias vector. Two weighted compression process, i.e. embedded vector S t and wherein when the designated time sequence after a splice t f t, multiplied by the compression weight matrix V, paranoid vector B plus compression, compressed to give weighting vector
Figure PCTCN2020132592-appb-000009
This application uses two-level weighted compression to model the user's historical behavior characteristics to capture the characteristics of the user's historical behavior characteristics following time series changes, simulate the trend of interest preferences despite time changes, and follow the deviation changes of interest preferences in time, and update content information Recommended strategy. The terms "first" and "second" in this application are used for distinction and not for limitation. Other similar terms have the same function and will not be repeated.
进一步地,用户特征包括用户属性特征、历史点击特征和行为线索特征,建模模块2,包括:Further, user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. Modeling module 2 includes:
第二拼接单元,用于将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;The second splicing unit is used to perform vector splicing of user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second splicing vector;
第一建模单元,用于在固定判别器的模型参数下,将第二拼接向量输入生成器的模型中,通过第一交叉熵损失函数约束,对生成器的模型进行建模;The first modeling unit is used to input the second stitching vector into the model of the generator under the fixed model parameters of the discriminator, and model the model of the generator through the first cross-entropy loss function constraint;
第一判断单元,用于判断第一交叉熵损失函数是否达到最小值;The first judging unit is used to judge whether the first cross-entropy loss function reaches the minimum value;
得到单元,用于若达到最小值,则得到生成器的模型。Get unit is used to get the model of the generator if it reaches the minimum value.
本申请实施例中,将用户属性特征、历史点击特征和行为线索特征,进行向量拼接得到第二拼接向量[P;T;Q]。本申请中对判别器进行建模时,首先构造样本训练数据,具体方法是将第二拼接向量[P;T;Q]与生成器输出的cpred拼接作为负样本特征向量;将第二拼接向量[P;T;Q]与用户真实的点击c拼接后作为正样本特征向量。本申请的生成器的模型公式如下:
Figure PCTCN2020132592-appb-000010
其中,φ为基于多层卷积神经网络的策略模型,R(φ)为正则化项,η为正则化参数,r表示固定参量的判别器。生成器对输入第二拼接向量[P;T;Q]时的输出结果表示为cpred=MultiConv([P;T;Q]),上述第一交叉熵损失函数表示为:lossg=CrossEntropy(cpred,c),即表示cpred与c之间的损失度量。本申请的多层卷积神经网络的参数经过Adam算法的优化。
In the embodiment of the present application, the second stitching vector [P; T; Q] is obtained by performing vector stitching of user attribute characteristics, historical click features, and behavior cue features. When modeling the discriminator in this application, the sample training data is first constructed. The specific method is to splice the second splicing vector [P;T;Q] with the cpred output by the generator as the negative sample feature vector; use the second splicing vector [P;T;Q] is spliced with the user's real click c as a positive sample feature vector. The model formula of the generator of this application is as follows:
Figure PCTCN2020132592-appb-000010
Among them, φ is a strategy model based on a multilayer convolutional neural network, R(φ) is a regularization term, η is a regularization parameter, and r represents a discriminator with fixed parameters. The output of the generator when the second splicing vector [P;T;Q] is input is expressed as cpred=MultiConv([P;T;Q]), and the above-mentioned first cross-entropy loss function is expressed as: lossg=CrossEntropy(cpred, c), which means the loss metric between cpred and c. The parameters of the multi-layer convolutional neural network of this application are optimized by the Adam algorithm.
进一步地,第二拼接单元,包括:Further, the second splicing unit includes:
输入子单元,用于将加权压缩向量输入sigmoid函数,得到加权压缩向量的输出结果;The input subunit is used to input the weighted compression vector into the sigmoid function to obtain the output result of the weighted compression vector;
得到子单元,用于将加权压缩向量的输出结果,乘以回报函数参数,得到回报值;Obtain the subunit, which is used to multiply the output result of the weighted compression vector by the parameter of the reward function to obtain the reward value;
作为子单元,用于将回报值的计算方式作为判别器的模型。As a sub-unit, it is used to use the calculation method of the reward value as a model of the discriminator.
本申请判别器的模型的公式为:
Figure PCTCN2020132592-appb-000011
v T表示回报函数参数。
The formula of the discriminator model of this application is:
Figure PCTCN2020132592-appb-000011
v T represents the parameter of the reward function.
进一步地,对抗学习模块3,包括:Further, the confrontation learning module 3 includes:
第三拼接单元,用于将第二拼接向量与生成器的建模结果拼接,形成负样本特征向量,将第二拼接向量与第二拼接向量对应的用户点击真实值拼接为正样本特征向量;The third splicing unit is used to splice the second splicing vector with the modeling result of the generator to form a negative sample feature vector, and splice the second splicing vector and the user click real value corresponding to the second splicing vector into a positive sample feature vector;
第二建模单元,用于将负样本特征向量和正样本特征向量输入判别器,固定生成器参数,在第二交叉熵损失函数的约束下对判别器进行建模;The second modeling unit is used to input the negative sample feature vector and the positive sample feature vector into the discriminator, fix the generator parameters, and model the discriminator under the constraints of the second cross-entropy loss function;
第二判断单元,用于判断第二交叉熵损失函数是否达到最小值;The second judging unit is used to judge whether the second cross-entropy loss function reaches the minimum value;
确定单元,用于若达到最小值,则确定判别器的参数;The determination unit is used to determine the parameters of the discriminator if the minimum value is reached;
对抗学习单元,用于根据生成器和判别器的建模过程,通过对抗模型将生成器和判别器对抗学习,至第一交叉熵损失函数和第二交叉熵损失函数均达到最小值。The confrontation learning unit is used for learning the generator and the discriminator through the confrontation model according to the modeling process of the generator and the discriminator, until the first cross entropy loss function and the second cross entropy loss function both reach the minimum.
本申请的第二交叉熵损失函数由两部分组成,一部分对应于生成器对第二拼接向量的输出约束,另一部分对应于对真实点击动作的输出约束,即loss d=loss 1+loss 2,loss 1=CrossEntropy(0,MultiConv([P;T;Q;cpred]));loss 2=CrossEntropy(1,MultiConv([P;T;Q;c]))。本申请的对抗模型的公式表示为:
Figure PCTCN2020132592-appb-000012
其中,θ表示对抗学习中判别器的优化参数,α表示对抗学习中生成器的参数。本申请的对抗学习中,生成器的学习目标为根据构造的用户特征的向量,尽可能生成类似用户点击行为cpred,而判别器的学习目标则是能够区分出真实的用户点击行为与生成器生成的类似用户点击行为。对抗学习中判别器和生成器的参数交替固定。先固定判别器的参数,通过loss g训练生成器,当loss g下降时,说明生成器生成的cpred成功欺骗了判别器。然后固定生成器参数,在loss d约束下训练判别器,当loss d下降时,说明判别器又成功区分了cpred和c。交替训练学习,直至loss d和loss g均小于预设阈值,达到最小值。此时的生成器能考虑用户的历史点击信息,并尽可能模仿用户点击动作的决策,而判别器能够模拟用户点击动作的反馈。
The second cross-entropy loss function of this application consists of two parts, one part corresponds to the output constraint of the generator to the second stitching vector, and the other part corresponds to the output constraint to the real click action, namely loss d = loss 1 + loss 2 , loss 1 = CrossEntropy(0, MultiConv([P; T; Q; cpred])); loss 2 = CrossEntropy(1, MultiConv([P; T; Q; c])). The formula of the confrontation model of this application is expressed as:
Figure PCTCN2020132592-appb-000012
Among them, θ represents the optimized parameters of the discriminator in the adversarial learning, and α represents the parameters of the generator in the adversarial learning. In the adversarial learning of this application, the learning goal of the generator is to generate a similar user click behavior cpred as much as possible according to the constructed vector of user characteristics, while the learning goal of the discriminator is to be able to distinguish the real user click behavior from the generator generation Similar to user click behavior. In the adversarial learning, the parameters of the discriminator and generator are alternately fixed. First fix the parameters of the discriminator, and train the generator through loss g . When the loss g drops, it means that the cpred generated by the generator successfully deceived the discriminator. Then fix the generator parameters and train the discriminator under the constraint of loss d . When loss d drops, it means that the discriminator has successfully distinguished between cpred and c. Alternate training and learning until loss d and loss g are both smaller than the preset threshold and reach the minimum value. The generator at this time can consider the user's historical click information and imitate the user's click action decision as much as possible, while the discriminator can simulate the feedback of the user's click action.
进一步地,确定模块5,包括:Further, the determining module 5 includes:
第二输入单元,用于将当前用户的历史信息和指定营销活动信息输入对抗学习后的生成器中;The second input unit is used for inputting the current user's history information and designated marketing activity information into the generator after confrontation learning;
第三判断单元,用于判断对抗学习后的判别器的反馈值是否等于1;The third judgment unit is used to judge whether the feedback value of the discriminator after confrontation learning is equal to 1;
判定单元,用于若等于1,则判定指定营销活动信息属于当前用户的兴趣偏好特征。The determining unit is used for determining that the specified marketing activity information belongs to the current user's interest preference feature if it is equal to 1.
本申请实施例以用于选择营销活动信息为例,进行详细说明。上述营销活动信息包括但不限于发红包、发优惠劵、返回扣等,通过将不同的营销活动信息对于特征向量与当前用户的历史信息对应的向量输入生成器,由生成 器模拟用户在不同营销活动信息的点击行为,并通过判别器反馈的区别值的大小,确定用户对不同营销活动信息的兴趣偏好。The embodiments of this application are used for selecting marketing activity information as an example for detailed description. The above-mentioned marketing information includes, but is not limited to, red envelopes, discount coupons, rebates, etc. By inputting different marketing activities information to the vector corresponding to the characteristic vector of the current user’s historical information into the generator, the generator simulates the user’s different marketing activities. The click behavior of activity information, and the size of the difference value fed back by the discriminator, determine the user's interest and preference for different marketing activity information.
进一步地,基于对抗学习的内容推荐装置,包括:Further, the content recommendation device based on adversarial learning includes:
第一获取模块,用于获取影响用户点击动作的指定特征,其中,指定特征为影响用户点击动作的所有特征中任意一个;The first acquisition module is used to acquire the designated feature that affects the user's click action, where the designated feature is any one of all the features that affect the user's click action;
改变模块,用于改变指定特征输入判别器时的特征数据范围;The change module is used to change the range of feature data when the specified feature is input to the discriminator;
第二获取模块,用于获取跟随特征数据范围对应变化的输出值变化范围;The second acquisition module is used to acquire the output value change range that follows the corresponding change of the characteristic data range;
第二判断模块,用于判断输出值变化范围是否超过预设范围;The second judgment module is used to judge whether the change range of the output value exceeds the preset range;
判定模块,用于若超过预设范围,则判定指定特征为影响用户点击动作的敏感特征。The judging module is used for judging that the specified feature is a sensitive feature that affects the user's click action if it exceeds the preset range.
本申请实施例通过将用户历史特征和真实点击行为输入判别器,判别器反馈输出值为1,则说明是真实的点击行为。上述指定特征比如为时间,特征数据范围包括时间跨度,跟随时间跨度的变化,判别器输出值改变范围也大幅改变,说明用户对时间特征是敏感的,则确定时间特征为用户的敏感特征,可通过敏感特征对用户形成持续发展的画像,以便实时更新用户分类和聚群。In the embodiment of the present application, the user's historical characteristics and real click behavior are input into the discriminator, and the discriminator feedback output value is 1, indicating that it is a real click behavior. For example, the specified feature is time. The feature data range includes the time span. Following the change of the time span, the change range of the discriminator output value also changes significantly, indicating that the user is sensitive to the time feature, and the time feature is determined as the user's sensitive feature. Form a continuously developing portrait of users through sensitive features, so as to update user classification and clustering in real time.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储基于对抗学习的内容推荐过程需要的所有数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现基于对抗学习的内容推荐方法。Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used to store all the data required for the content recommendation process based on adversarial learning. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize the content recommendation method based on adversarial learning.
上述处理器执行上述基于对抗学习的内容推荐方法,包括:通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;根据加权压缩向量对生成器和判别器进行建模;将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;判断生成器和判别器的对抗学习是否达到预设条件;若是,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。The processor executes the content recommendation method based on adversarial learning, including: obtaining a weighted compression vector corresponding to the user's historical behavior feature through weighted compression of pre-built user characteristics; modeling the generator and the discriminator according to the weighted compression vector; The modeled generator is combined with the discriminator to conduct adversarial learning under the adversarial model; judge whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, input the current user's historical information into the adversarial learning generator In the process, the feedback value of the discriminator after adversarial learning is combined to determine the current user's interest preference feature; according to the current user's interest preference feature, content information matching the current user's interest preference feature is recommended to the current user.
上述计算机设备,通过加权压缩对用户的历史行为特征进行建模,以捕获用户的历史行为特征跟随时序变化的特征,并基于对抗学习使得生成器可获取在线用户的兴趣偏好特征,并精准推荐内容信息。The above-mentioned computer equipment models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics that follow time series changes, and based on adversarial learning, the generator can obtain online users' interest and preference characteristics, and accurately recommend content information.
在一个实施例中,上述处理器通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤,包括:在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵;将时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第 一乘积矩阵;将第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;将第一矫正矩阵输入sigmoid函数,得到用户历史行为特征对应的嵌入向量;将用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;将第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;将第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。In one embodiment, the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension. Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained. A correction matrix; input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.
在一个实施例中,用户特征包括用户属性特征、历史点击特征和行为线索特征,上述处理器根据加权压缩向量对生成器和判别器进行建模的步骤,包括:将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;在固定判别器的模型参数下,将第二拼接向量输入生成器的模型中,通过第一交叉熵损失函数约束,对生成器的模型进行建模;判断第一交叉熵损失函数是否达到最小值;若是,则得到生成器的模型。In an embodiment, the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. The step of modeling the generator and the discriminator according to the weighted compression vector by the above-mentioned processor includes: combining the user attribute characteristics and the historical click characteristics Perform vector stitching with behavioral clues to obtain the second stitching vector; under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is constrained by the first cross-entropy loss function. Perform modeling; determine whether the first cross-entropy loss function reaches the minimum value; if so, obtain the generator model.
在一个实施例中,上述处理器将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤之前,包括:将加权压缩向量输入sigmoid函数,得到加权压缩向量的输出结果;将加权压缩向量的输出结果,乘以回报函数参数,得到回报值;将回报值的计算方式作为判别器的模型。In one embodiment, before the step of performing vector splicing of the user attribute characteristics, historical click characteristics, and behavior cue characteristics, to obtain the second splicing vector, the processor includes: inputting the weighted compression vector into the sigmoid function to obtain the weighted compression vector Output result; multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value; use the calculation method of the reward value as the model of the discriminator.
在一个实施例中,上述处理器将建模后的生成器与判别器联合,在对抗模型下进行对抗学习的步骤,包括:将第二拼接向量与生成器的建模结果拼接,形成负样本特征向量,将第二拼接向量与第二拼接向量对应的用户点击真实值拼接为正样本特征向量;将负样本特征向量和正样本特征向量输入判别器,固定生成器参数,在第二交叉熵损失函数的约束下对判别器进行建模;判断第二交叉熵损失函数是否达到最小值;若是,则确定判别器的参数;根据生成器和判别器的建模过程,通过对抗模型将生成器和判别器对抗学习,至第一交叉熵损失函数和第二交叉熵损失函数均达到最小值。In one embodiment, the above-mentioned processor combines the modeled generator with the discriminator, and performs the step of adversarial learning under the adversarial model, including: splicing the second splicing vector with the modeling result of the generator to form a negative sample The feature vector, the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into the positive sample feature vector; the negative sample feature vector and the positive sample feature vector are input to the discriminator, the generator parameters are fixed, and the second cross-entropy loss Model the discriminator under the constraints of the function; judge whether the second cross-entropy loss function reaches the minimum value; if so, determine the parameters of the discriminator; according to the modeling process of the generator and the discriminator, the generator and the discriminator are combined through the confrontation model The discriminator fights against learning until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum.
在一个实施例中,上述处理器将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征的步骤,包括:将当前用户的历史信息和指定营销活动信息输入对抗学习后的生成器中;判断对抗学习后的判别器的反馈值是否等于1;若是,则判定指定营销活动信息属于当前用户的兴趣偏好特征。In one embodiment, the above-mentioned processor inputs the historical information of the current user into the generator after the confrontation learning, and combines the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user. The historical information and designated marketing activity information are input into the generator after confrontation learning; it is judged whether the feedback value of the discriminator after confrontation learning is equal to 1; if so, it is determined that the designated marketing activity information belongs to the current user’s interest preference feature.
在一个实施例中,上述处理器根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息的步骤之后,包括:获取影响用户点击动作的指定特征,其中,指定特征为影响用户点击动作的所有特征中任意一个;改变指定特征输入判别器时的特征数据范围;获取跟随特征数据范围对应变化的输出值变化范围;判断输出值变化范围是否超过预设范围;若是,则判定指定特征为影响用户点击动作的敏感特征。In one embodiment, after the above-mentioned processor recommends to the current user content information that matches the current user’s interest preference feature according to the current user’s interest preference feature, the step includes: acquiring a specified feature that affects the user’s click action, where the specified A feature is any one of all the features that affect the user's click action; change the feature data range when the specified feature is input to the discriminator; obtain the output value change range that follows the corresponding change of the feature data range; determine whether the output value change range exceeds the preset range; if so , It is determined that the specified feature is a sensitive feature that affects the user's click action.
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
本申请一实施例还提供一种计算机可读存储介质,计算机可读存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,计算机程序被处理器执行时实现基于对抗学习的内容推荐方法,包括:通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;根据加权压缩向量对生成器和判别器进行建模;将建模后的生成器与判别器联合,在对抗模型下进行对抗学习;判断生成器和判别器的对抗学习是否达到预设条件;若是,则将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征;根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息。An embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. A computer program is stored thereon. When the computer program is executed by the processor, the The learning content recommendation method includes: obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance; modeling the generator and the discriminator according to the weighted compression vector; and combining the modeled generator with The discriminator is combined to conduct adversarial learning under the adversarial model; it is judged whether the adversarial learning of the generator and the discriminator meets the preset conditions; if so, the current user’s historical information is input into the adversarial learning generator, combined with the adversarial learning The feedback value of the discriminator determines the interest preference feature of the current user; according to the interest preference feature of the current user, it is recommended to the current user content information that matches the interest preference feature of the current user.
上述计算机可读存储介质,通过加权压缩对用户的历史行为特征进行建模,以捕获用户的历史行为特征跟随时序变化的特征,并基于对抗学习使得生成器可获取在线用户的兴趣偏好特征,并精准推荐内容信息。The above-mentioned computer-readable storage medium models the user's historical behavior characteristics through weighted compression to capture the characteristics of the user's historical behavior characteristics changing with time series, and based on adversarial learning, the generator can obtain the interest preference characteristics of online users, and Accurately recommend content information.
在一个实施例中,上述处理器通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤,包括:在时序维度和特征维度的二维空间上,对用户特征按照进行时序编码,得到用户特征对应的时序特征矩阵;将时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;将第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;将第一矫正矩阵输入sigmoid函数,得到用户历史行为特征对应的嵌入向量;将用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;将第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;将第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。In one embodiment, the above-mentioned processor obtains the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the user characteristics constructed in advance, including: performing the user characteristics according to the two-dimensional space of the time series dimension and the feature dimension. Time sequence coding to obtain the time sequence feature matrix corresponding to the user characteristics; multiply the time sequence feature matrix with the first compression weight matrix to obtain the first product matrix after data compression; after the first product matrix is corrected by the first paranoia vector, the first product matrix is obtained A correction matrix; input the first correction matrix into the sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature; splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form the first splicing vector; The splicing vector is multiplied by the second compression weight matrix to obtain a second product matrix after data compression; after the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior characteristics is obtained.
在一个实施例中,用户特征包括用户属性特征、历史点击特征和行为线索特征,上述处理器根据加权压缩向量对生成器和判别器进行建模的步骤,包括:将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;在固定判别器的模型参数下,将第二拼接向量输入生成器的模型中,通过第一交叉熵损失函数约束,对生成器的模型进行建模;判断第一交叉熵损失函数是否达到最小值;若是,则得到生成器的模型。In an embodiment, the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics. The step of modeling the generator and the discriminator according to the weighted compression vector by the above-mentioned processor includes: combining the user attribute characteristics and the historical click characteristics Perform vector stitching with behavioral clues to obtain the second stitching vector; under the fixed model parameters of the discriminator, the second stitching vector is input into the model of the generator, and the model of the generator is constrained by the first cross-entropy loss function. Perform modeling; determine whether the first cross-entropy loss function reaches the minimum value; if so, obtain the generator model.
在一个实施例中,上述处理器将用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤之前,包括:将加权压缩向量输入sigmoid函数,得到加权压缩向量的输出结果;将加权压缩向量的输出结果,乘以回报函数参数,得到回报值;将回报值的计算方式作为判别器的模型。In one embodiment, before the step of performing vector splicing of the user attribute characteristics, historical click characteristics, and behavior cue characteristics, to obtain the second splicing vector, the processor includes: inputting the weighted compression vector into the sigmoid function to obtain the weighted compression vector Output result; multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value; use the calculation method of the reward value as the model of the discriminator.
在一个实施例中,上述处理器将建模后的生成器与判别器联合,在对抗模型下进行对抗学习的步骤,包括:将第二拼接向量与生成器的建模结果拼接,形成负样本特征向量,将第二拼接向量与第二拼接向量对应的用户点击真实值拼接为正样本特征向量;将负样本特征向量和正样本特征向量输入判别器,固定生成器参数,在第二交叉熵损失函数的约束下对判别器进行建模;判断第二交叉熵损失函数是否达到最小值;若是,则确定判别器的参数;根 据生成器和判别器的建模过程,通过对抗模型将生成器和判别器对抗学习,至第一交叉熵损失函数和第二交叉熵损失函数均达到最小值。In one embodiment, the above-mentioned processor combines the modeled generator with the discriminator, and performs the step of adversarial learning under the adversarial model, including: splicing the second splicing vector with the modeling result of the generator to form a negative sample The feature vector, the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into the positive sample feature vector; the negative sample feature vector and the positive sample feature vector are input to the discriminator, the generator parameters are fixed, and the second cross-entropy loss Model the discriminator under the constraints of the function; judge whether the second cross-entropy loss function reaches the minimum value; if so, determine the parameters of the discriminator; according to the modeling process of the generator and the discriminator, the generator and the discriminator are combined through the confrontation model The discriminator fights against learning until the first cross-entropy loss function and the second cross-entropy loss function both reach the minimum.
在一个实施例中,上述处理器将当前用户的历史信息输入对抗学习后的生成器中,结合对抗学习后的判别器的反馈值,确定当前用户的兴趣偏好特征的步骤,包括:将当前用户的历史信息和指定营销活动信息输入对抗学习后的生成器中;判断对抗学习后的判别器的反馈值是否等于1;若是,则判定指定营销活动信息属于当前用户的兴趣偏好特征。In one embodiment, the above-mentioned processor inputs the historical information of the current user into the generator after the confrontation learning, and combines the feedback value of the discriminator after the confrontation learning to determine the interest preference characteristics of the current user. The historical information and designated marketing activity information are input into the generator after confrontation learning; it is judged whether the feedback value of the discriminator after confrontation learning is equal to 1; if so, it is determined that the designated marketing activity information belongs to the current user’s interest preference feature.
在一个实施例中,上述处理器根据当前用户的兴趣偏好特征,向当前用户推荐与当前用户的兴趣偏好特征匹配的内容信息的步骤之后,包括:获取影响用户点击动作的指定特征,其中,指定特征为影响用户点击动作的所有特征中任意一个;改变指定特征输入判别器时的特征数据范围;获取跟随特征数据范围对应变化的输出值变化范围;判断输出值变化范围是否超过预设范围;若是,则判定指定特征为影响用户点击动作的敏感特征。In one embodiment, after the above-mentioned processor recommends to the current user content information that matches the current user’s interest preference feature according to the current user’s interest preference feature, the step includes: acquiring a specified feature that affects the user’s click action, where the specified A feature is any one of all the features that affect the user's click action; change the feature data range when the specified feature is input to the discriminator; obtain the output value change range that follows the corresponding change of the feature data range; determine whether the output value change range exceeds the preset range; if so , It is determined that the specified feature is a sensitive feature that affects the user's click action.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by computer programs instructing relevant hardware. The above-mentioned computer programs can be stored in a non-volatile computer readable storage medium. Here, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article, or method that includes the element.
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the specification and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种基于对抗学习的内容推荐方法,其中,包括:A content recommendation method based on adversarial learning, which includes:
    通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;
    根据所述加权压缩向量对生成器和判别器进行建模;Modeling the generator and the discriminator according to the weighted compression vector;
    将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习;Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
    判断所述生成器和判别器的对抗学习是否达到预设条件;Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;
    若是,则将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征;If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;
    根据所述当前用户的兴趣偏好特征,向所述当前用户推荐与所述当前用户的兴趣偏好特征匹配的内容信息。According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
  2. 根据权利要求1所述的基于对抗学习的内容推荐方法,其中,所述通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤,包括:The content recommendation method based on adversarial learning according to claim 1, wherein the step of obtaining the weighted compression vector corresponding to the user's historical behavior feature by weighted compression of the user features constructed in advance, comprises:
    在时序维度和特征维度的二维空间上,对所述用户特征按照进行时序编码,得到所述用户特征对应的时序特征矩阵;In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;
    将所述时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;
    将所述第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;
    将所述第一矫正矩阵输入sigmoid函数,得到所述用户历史行为特征对应的嵌入向量;Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
    将所述用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;
    将所述第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;
    将所述第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
  3. 根据权利要求1所述的基于对抗学习的内容推荐方法,其中,所述用户特征包括用户属性特征、历史点击特征和行为线索特征,所述根据所述加权压缩向量对生成器和判别器进行建模的步骤,包括:The content recommendation method based on adversarial learning according to claim 1, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the generator and the discriminator are constructed according to the weighted compression vector The steps of the model include:
    将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;
    在固定所述判别器的模型参数下,将所述第二拼接向量输入所述生成器的模型中,通过第一交叉熵损失函数约束,对所述生成器的模型进行建模;When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;
    判断所述第一交叉熵损失函数是否达到最小值;Judging whether the first cross-entropy loss function reaches a minimum value;
    若是,则得到所述生成器的模型。If so, the model of the generator is obtained.
  4. 根据权利要求3所述的基于对抗学习的内容推荐方法,其中,所述将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤之前,包括:The content recommendation method based on adversarial learning according to claim 3, wherein before the step of performing vector splicing of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second splicing vector, the method comprises:
    将所述加权压缩向量输入sigmoid函数,得到所述加权压缩向量的输出结果;Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;
    将所述加权压缩向量的输出结果,乘以回报函数参数,得到回报值;Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;
    将所述回报值的计算方式作为所述判别器的模型。The calculation method of the reward value is used as a model of the discriminator.
  5. 根据权利要求1所述的基于对抗学习的内容推荐方法,其中,所述将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习的步骤,包括:The content recommendation method based on adversarial learning according to claim 1, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:
    将所述第二拼接向量与所述生成器的建模结果拼接,形成负样本特征向量,将所述第二拼接向量与所述第二拼接向量对应的用户点击真实值拼接为正样本特征向量;The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ;
    将所述负样本特征向量和正样本特征向量输入所述判别器,固定所述生成器参数,在第二交叉熵损失函数的约束下对所述判别器进行建模;Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;
    判断所述第二交叉熵损失函数是否达到最小值;Judging whether the second cross-entropy loss function reaches a minimum value;
    若是,则确定所述判别器的参数;If yes, determine the parameters of the discriminator;
    根据所述生成器和所述判别器的建模过程,通过对抗模型将所述生成器和所述判别器对抗学习,至所述第一交叉熵损失函数和所述第二交叉熵损失函数均达到最小值。According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
  6. 根据权利要求1所述的基于对抗学习的内容推荐方法,其中,所述将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征的步骤,包括:The content recommendation method based on adversarial learning according to claim 1, wherein said inputting historical information of the current user into said generator after adversarial learning is combined with the feedback value of said discriminator after adversarial learning to determine The step of the current user's interest preference feature includes:
    将所述当前用户的历史信息和指定营销活动信息输入对抗学习后的所述生成器中;Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;
    判断对抗学习后的所述判别器的反馈值是否等于1;Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;
    若是,则判定所述指定营销活动信息属于所述当前用户的兴趣偏好特征。If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.
  7. 根据权利要求1所述的基于对抗学习的内容推荐方法,其中,所述根据所述当前用户的兴趣偏好特征,向所述当前用户推荐与所述当前用户的兴趣偏好特征匹配的内容信息的步骤之后,包括:The content recommendation method based on adversarial learning according to claim 1, wherein the step of recommending to the current user content information that matches the current user’s interest preference feature according to the current user’s interest preference feature After that, include:
    获取影响用户点击动作的指定特征,其中,所述指定特征为影响用户点击动作的所有特征中任意一个;Acquiring a specified feature that affects the user's click action, where the specified feature is any one of all the features that affect the user's click action;
    改变所述指定特征输入所述判别器时的特征数据范围;Changing the range of feature data when the specified feature is input to the discriminator;
    获取跟随所述特征数据范围对应变化的输出值变化范围;Acquiring a change range of the output value following the corresponding change of the characteristic data range;
    判断所述输出值变化范围是否超过预设范围;Judging whether the change range of the output value exceeds a preset range;
    若是,则判定所述指定特征为影响所述用户点击动作的敏感特征。If so, it is determined that the specified feature is a sensitive feature that affects the user's click action.
  8. 一种基于对抗学习的内容推荐装置,其中,包括:A content recommendation device based on adversarial learning, which includes:
    得到模块,用于通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;The obtaining module is used to obtain the weighted compression vector corresponding to the user's historical behavior feature through weighted compression of the user characteristics constructed in advance;
    建模模块,用于根据所述加权压缩向量对生成器和判别器进行建模;A modeling module for modeling the generator and the discriminator according to the weighted compression vector;
    对抗学习模块,用于将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习;The adversarial learning module is used to combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
    第一判断模块,用于判断所述生成器和判别器的对抗学习是否达到预设条件;The first judgment module is used to judge whether the adversarial learning of the generator and the discriminator reaches a preset condition;
    确定模块,用于若达到预设条件,则将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征;The determination module is used to input the historical information of the current user into the generator after the confrontation learning if the preset conditions are met, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference of the current user feature;
    推荐模块,用于根据所述当前用户的兴趣偏好特征,向所述当前用户推荐与所述当前用户的兴趣偏好特征匹配的内容信息。The recommendation module is configured to recommend to the current user content information that matches the current user's interest preference characteristics according to the current user's interest preference characteristics.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现基于对抗学习的内容推荐方法,包括:A computer device includes a memory and a processor, the memory stores a computer program, wherein the processor executes the computer program to implement a content recommendation method based on adversarial learning, including:
    通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;
    根据所述加权压缩向量对生成器和判别器进行建模;Modeling the generator and the discriminator according to the weighted compression vector;
    将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习;Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
    判断所述生成器和判别器的对抗学习是否达到预设条件;Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;
    若是,则将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征;If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;
    根据所述当前用户的兴趣偏好特征,向所述当前用户推荐与所述当前用户的兴趣偏好特征匹配的内容信息。According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
  10. 根据权利要求9所述的计算机设备,其中,所述通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤,包括:8. The computer device according to claim 9, wherein the step of obtaining a weighted compression vector corresponding to the user's historical behavior characteristic by weighting and compressing the user characteristics constructed in advance, comprises:
    在时序维度和特征维度的二维空间上,对所述用户特征按照进行时序编码,得到所述用户特征对应的时序特征矩阵;In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;
    将所述时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;
    将所述第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;
    将所述第一矫正矩阵输入sigmoid函数,得到所述用户历史行为特征对应的嵌入向量;Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
    将所述用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;
    将所述第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;
    将所述第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
  11. 根据权利要求9所述的计算机设备,其中,所述用户特征包括用户属性特征、历史点击特征和行为线索特征,所述根据所述加权压缩向量对生 成器和判别器进行建模的步骤,包括:The computer device according to claim 9, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the step of modeling the generator and the discriminator according to the weighted compression vector includes :
    将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;
    在固定所述判别器的模型参数下,将所述第二拼接向量输入所述生成器的模型中,通过第一交叉熵损失函数约束,对所述生成器的模型进行建模;When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;
    判断所述第一交叉熵损失函数是否达到最小值;Judging whether the first cross-entropy loss function reaches a minimum value;
    若是,则得到所述生成器的模型。If so, the model of the generator is obtained.
  12. 根据权利要求11所述的计算机设备,其中,所述将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤之前,包括:11. The computer device according to claim 11, wherein before the step of performing vector stitching on the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain the second stitching vector, the step comprises:
    将所述加权压缩向量输入sigmoid函数,得到所述加权压缩向量的输出结果;Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;
    将所述加权压缩向量的输出结果,乘以回报函数参数,得到回报值;Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;
    将所述回报值的计算方式作为所述判别器的模型。The calculation method of the reward value is used as a model of the discriminator.
  13. 根据权利要求9所述的计算机设备,其中,所述将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习的步骤,包括:The computer device according to claim 9, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:
    将所述第二拼接向量与所述生成器的建模结果拼接,形成负样本特征向量,将所述第二拼接向量与所述第二拼接向量对应的用户点击真实值拼接为正样本特征向量;The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ;
    将所述负样本特征向量和正样本特征向量输入所述判别器,固定所述生成器参数,在第二交叉熵损失函数的约束下对所述判别器进行建模;Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;
    判断所述第二交叉熵损失函数是否达到最小值;Judging whether the second cross-entropy loss function reaches a minimum value;
    若是,则确定所述判别器的参数;If yes, determine the parameters of the discriminator;
    根据所述生成器和所述判别器的建模过程,通过对抗模型将所述生成器和所述判别器对抗学习,至所述第一交叉熵损失函数和所述第二交叉熵损失函数均达到最小值。According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
  14. 根据权利要求9所述的计算机设备,其中,所述将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征的步骤,包括:The computer device according to claim 9, wherein said inputting the historical information of the current user into the generator after confrontation learning, and combining the feedback value of the discriminator after confrontation learning to determine the current user’s The steps of interest preference feature include:
    将所述当前用户的历史信息和指定营销活动信息输入对抗学习后的所述生成器中;Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;
    判断对抗学习后的所述判别器的反馈值是否等于1;Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;
    若是,则判定所述指定营销活动信息属于所述当前用户的兴趣偏好特征。If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现基于对抗学习的内容推荐方法,包括:A computer-readable storage medium with a computer program stored thereon, wherein the method for content recommendation based on adversarial learning when the computer program is executed by a processor includes:
    通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量;Obtain the weighted compression vector corresponding to the user's historical behavior feature by weighting and compressing the pre-built user characteristics;
    根据所述加权压缩向量对生成器和判别器进行建模;Modeling the generator and the discriminator according to the weighted compression vector;
    将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习;Combine the modeled generator and the discriminator to conduct adversarial learning under the adversarial model;
    判断所述生成器和判别器的对抗学习是否达到预设条件;Judging whether the adversarial learning of the generator and the discriminator meets a preset condition;
    若是,则将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征;If yes, input the historical information of the current user into the generator after the confrontation learning, and combine the feedback value of the discriminator after the confrontation learning to determine the interest preference feature of the current user;
    根据所述当前用户的兴趣偏好特征,向所述当前用户推荐与所述当前用户的兴趣偏好特征匹配的内容信息。According to the interest preference feature of the current user, content information matching the interest preference feature of the current user is recommended to the current user.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述通过加权压缩预先构建的用户特征,得到用户历史行为特征对应的加权压缩向量的步骤,包括:15. The computer-readable storage medium according to claim 15, wherein the step of obtaining a weighted compression vector corresponding to the user's historical behavior characteristic by weighting and compressing the user characteristics constructed in advance, comprises:
    在时序维度和特征维度的二维空间上,对所述用户特征按照进行时序编码,得到所述用户特征对应的时序特征矩阵;In the two-dimensional space of the time series dimension and the feature dimension, the user characteristics are coded according to the time series to obtain the time series feature matrix corresponding to the user characteristics;
    将所述时序特征矩阵与第一压缩权重矩阵相乘,得到数据压缩后的第一乘积矩阵;Multiplying the time series feature matrix and the first compression weight matrix to obtain a first product matrix after data compression;
    将所述第一乘积矩阵通过第一偏执向量矫正后,得到第一矫正矩阵;After the first product matrix is corrected by the first paranoia vector, the first correction matrix is obtained;
    将所述第一矫正矩阵输入sigmoid函数,得到所述用户历史行为特征对应的嵌入向量;Input the first correction matrix into a sigmoid function to obtain the embedding vector corresponding to the user's historical behavior feature;
    将所述用户历史行为特征对应的嵌入向量与指定时刻对应的时序特征拼接,形成第一拼接向量;Splicing the embedding vector corresponding to the user's historical behavior feature with the time series feature corresponding to the specified time to form a first splicing vector;
    将所述第一拼接向量与第二压缩权重矩阵相乘,得到数据压缩后的第二乘积矩阵;Multiplying the first splicing vector by the second compression weight matrix to obtain a second product matrix after data compression;
    将所述第二乘积矩阵通过第二偏执向量矫正后,得到用户历史行为特征对应的加权压缩向量。After the second product matrix is corrected by the second paranoia vector, the weighted compression vector corresponding to the user's historical behavior feature is obtained.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述用户特征包括用户属性特征、历史点击特征和行为线索特征,所述根据所述加权压缩向量对生成器和判别器进行建模的步骤,包括:The computer-readable storage medium according to claim 15, wherein the user characteristics include user attribute characteristics, historical click characteristics, and behavior cue characteristics, and the generator and the discriminator are modeled according to the weighted compression vector The steps include:
    将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量;Performing vector splicing on the user attribute characteristics, historical click characteristics, and behavior cue characteristics to obtain a second splicing vector;
    在固定所述判别器的模型参数下,将所述第二拼接向量输入所述生成器的模型中,通过第一交叉熵损失函数约束,对所述生成器的模型进行建模;When the model parameters of the discriminator are fixed, the second stitching vector is input into the model of the generator, and the model of the generator is modeled through the first cross-entropy loss function constraint;
    判断所述第一交叉熵损失函数是否达到最小值;Judging whether the first cross-entropy loss function reaches a minimum value;
    若是,则得到所述生成器的模型。If so, the model of the generator is obtained.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述将所述用户属性特征、历史点击特征和行为线索特征,进行向量拼接,得到第二拼接向量的步骤之前,包括:18. The computer-readable storage medium according to claim 17, wherein before the step of performing vector stitching of the user attribute characteristics, historical click characteristics and behavior cue characteristics to obtain a second stitching vector, the step comprises:
    将所述加权压缩向量输入sigmoid函数,得到所述加权压缩向量的输出结果;Input the weighted compression vector to a sigmoid function to obtain an output result of the weighted compression vector;
    将所述加权压缩向量的输出结果,乘以回报函数参数,得到回报值;Multiply the output result of the weighted compression vector by the reward function parameter to obtain the reward value;
    将所述回报值的计算方式作为所述判别器的模型。The calculation method of the reward value is used as a model of the discriminator.
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述将建模后的所述生成器与判别器联合,在对抗模型下进行对抗学习的步骤,包括:15. The computer-readable storage medium according to claim 15, wherein the step of combining the modeled generator and the discriminator to perform adversarial learning under the adversarial model comprises:
    将所述第二拼接向量与所述生成器的建模结果拼接,形成负样本特征向量,将所述第二拼接向量与所述第二拼接向量对应的用户点击真实值拼接为正样本特征向量;The second stitching vector is stitched with the modeling result of the generator to form a negative sample feature vector, and the second stitching vector and the user click real value corresponding to the second stitching vector are stitched into a positive sample feature vector ;
    将所述负样本特征向量和正样本特征向量输入所述判别器,固定所述生成器参数,在第二交叉熵损失函数的约束下对所述判别器进行建模;Input the negative sample feature vector and the positive sample feature vector to the discriminator, fix the generator parameters, and model the discriminator under the constraints of a second cross-entropy loss function;
    判断所述第二交叉熵损失函数是否达到最小值;Judging whether the second cross-entropy loss function reaches a minimum value;
    若是,则确定所述判别器的参数;If yes, determine the parameters of the discriminator;
    根据所述生成器和所述判别器的建模过程,通过对抗模型将所述生成器和所述判别器对抗学习,至所述第一交叉熵损失函数和所述第二交叉熵损失函数均达到最小值。According to the modeling process of the generator and the discriminator, the generator and the discriminator are antagonistically learned through an adversarial model until the first cross-entropy loss function and the second cross-entropy loss function are both Reached the minimum.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述将当前用户的历史信息输入对抗学习后的所述生成器中,结合对抗学习后的所述判别器的反馈值,确定所述当前用户的兴趣偏好特征的步骤,包括:The computer-readable storage medium according to claim 15, wherein said inputting the historical information of the current user into said generator after confrontation learning, and combining the feedback value of said discriminator after confrontation learning to determine said The steps of the current user’s interest preference feature include:
    将所述当前用户的历史信息和指定营销活动信息输入对抗学习后的所述生成器中;Inputting the historical information of the current user and the information of designated marketing activities into the generator after confrontation learning;
    判断对抗学习后的所述判别器的反馈值是否等于1;Determine whether the feedback value of the discriminator after adversarial learning is equal to 1;
    若是,则判定所述指定营销活动信息属于所述当前用户的兴趣偏好特征。If yes, it is determined that the specified marketing activity information belongs to the interest preference feature of the current user.
PCT/CN2020/132592 2020-09-28 2020-11-30 Content recommendation method and apparatus based on adversarial learning, and computer device WO2021169451A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011044966.7A CN112182384B (en) 2020-09-28 2020-09-28 Content recommendation method and device based on countermeasure learning and computer equipment
CN202011044966.7 2020-09-28

Publications (1)

Publication Number Publication Date
WO2021169451A1 true WO2021169451A1 (en) 2021-09-02

Family

ID=73945688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/132592 WO2021169451A1 (en) 2020-09-28 2020-11-30 Content recommendation method and apparatus based on adversarial learning, and computer device

Country Status (2)

Country Link
CN (1) CN112182384B (en)
WO (1) WO2021169451A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837805A (en) * 2021-09-24 2021-12-24 深圳闪回科技有限公司 Second-hand mobile phone price prediction algorithm for xDeleFM
CN114168845A (en) * 2021-11-24 2022-03-11 电子科技大学 Serialization recommendation method based on multi-task learning

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434761B (en) * 2021-06-25 2024-02-02 平安科技(深圳)有限公司 Recommendation model training method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360069A (en) * 2018-10-29 2019-02-19 郑州大学 A kind of recommended models based on pairs of dual training
CN110442804A (en) * 2019-08-13 2019-11-12 北京市商汤科技开发有限公司 A kind of training method, device, equipment and the storage medium of object recommendation network
CN110727868A (en) * 2019-10-12 2020-01-24 腾讯音乐娱乐科技(深圳)有限公司 Object recommendation method, device and computer-readable storage medium
CN111259264A (en) * 2020-01-15 2020-06-09 电子科技大学 Time sequence scoring prediction method based on generation countermeasure network
CN111460130A (en) * 2020-03-27 2020-07-28 咪咕数字传媒有限公司 Information recommendation method, device, equipment and readable storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11721090B2 (en) * 2017-07-21 2023-08-08 Samsung Electronics Co., Ltd. Adversarial method and system for generating user preferred contents
KR102629474B1 (en) * 2018-05-09 2024-01-26 삼성전자주식회사 Electronic apparatus for compression and decompression data and method thereof
KR102203252B1 (en) * 2018-10-19 2021-01-14 네이버 주식회사 Method and system for collaborative filtering based on generative adversarial networks
US11568260B2 (en) * 2018-10-29 2023-01-31 Google Llc Exponential modeling with deep learning features
US10715176B1 (en) * 2019-04-15 2020-07-14 EMC IP Holding Company LLC Recommending data compression scheme using machine learning and statistical attributes of the data
CN110162703A (en) * 2019-05-13 2019-08-23 腾讯科技(深圳)有限公司 Content recommendation method, training method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360069A (en) * 2018-10-29 2019-02-19 郑州大学 A kind of recommended models based on pairs of dual training
CN110442804A (en) * 2019-08-13 2019-11-12 北京市商汤科技开发有限公司 A kind of training method, device, equipment and the storage medium of object recommendation network
CN110727868A (en) * 2019-10-12 2020-01-24 腾讯音乐娱乐科技(深圳)有限公司 Object recommendation method, device and computer-readable storage medium
CN111259264A (en) * 2020-01-15 2020-06-09 电子科技大学 Time sequence scoring prediction method based on generation countermeasure network
CN111460130A (en) * 2020-03-27 2020-07-28 咪咕数字传媒有限公司 Information recommendation method, device, equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KANDO NORIKO, SAKAI TETSUYA, JOHO HIDEO, LI HANG, DE VRIES ARJEN P., WHITE RYEN W., WANG JUN, YU LANTAO, ZHANG WEINAN, GONG YU, XU: "IRGAN : A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models", RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, ACM, 2 PENN PLAZA, SUITE 701NEW YORKNY10121-0701USA, 7 August 2017 (2017-08-07), 2 Penn Plaza, Suite 701New YorkNY10121-0701USA, pages 515 - 524, XP055840571, ISBN: 978-1-4503-5022-8, DOI: 10.1145/3077136.3080786 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837805A (en) * 2021-09-24 2021-12-24 深圳闪回科技有限公司 Second-hand mobile phone price prediction algorithm for xDeleFM
CN114168845A (en) * 2021-11-24 2022-03-11 电子科技大学 Serialization recommendation method based on multi-task learning
CN114168845B (en) * 2021-11-24 2023-08-15 电子科技大学 Serialized recommendation method based on multitask learning

Also Published As

Publication number Publication date
CN112182384A (en) 2021-01-05
CN112182384B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
WO2021169451A1 (en) Content recommendation method and apparatus based on adversarial learning, and computer device
US20240046106A1 (en) Multi-task neural networks with task-specific paths
Krishnan et al. On the challenges of learning with inference networks on sparse, high-dimensional data
CN109902753B (en) User recommendation model training method and device, computer equipment and storage medium
CN110472060B (en) Topic pushing method and device, computer equipment and storage medium
CN111506820B (en) Recommendation model, recommendation method, recommendation device, recommendation equipment and recommendation storage medium
CN113901327A (en) Target recommendation model training method, recommendation device and electronic equipment
CN110705688B (en) Neural network system, method and device for performing risk assessment on operation event
JP2012518834A (en) Method and system for calculating website visitor ratings
CN110796261A (en) Feature extraction method and device based on reinforcement learning and computer equipment
CN114780831A (en) Sequence recommendation method and system based on Transformer
CN111598213A (en) Network training method, data identification method, device, equipment and medium
CN112905876A (en) Information pushing method and device based on deep learning and computer equipment
CN111695084A (en) Model generation method, credit score generation method, device, equipment and storage medium
CN113536105A (en) Recommendation model training method and device
US20220215255A1 (en) Learning content recommendation system for predicting probability of correct answer of user using collaborative filtering based on latent factor and operation method thereof
CN110807693A (en) Album recommendation method, device, equipment and storage medium
Gong et al. Deep exercise recommendation model
CN113051468B (en) Movie recommendation method and system based on knowledge graph and reinforcement learning
CN112817563B (en) Target attribute configuration information determining method, computer device, and storage medium
Ignatenko et al. On preference learning based on sequential bayesian optimization with pairwise comparison
CN115525782A (en) Video abstract generation method of self-adaptive graph structure
CN110929163B (en) Course recommendation method and device, computer equipment and storage medium
Alhendawi Predicting the effectiveness of web information systems using neural networks modeling: framework & empirical testing
CN115565051B (en) Lightweight face attribute recognition model training method, recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921604

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921604

Country of ref document: EP

Kind code of ref document: A1