CN113988291A - Training method and device for user representation network - Google Patents

Training method and device for user representation network Download PDF

Info

Publication number
CN113988291A
CN113988291A CN202111250535.0A CN202111250535A CN113988291A CN 113988291 A CN113988291 A CN 113988291A CN 202111250535 A CN202111250535 A CN 202111250535A CN 113988291 A CN113988291 A CN 113988291A
Authority
CN
China
Prior art keywords
user
network
prediction
sample
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111250535.0A
Other languages
Chinese (zh)
Other versions
CN113988291B (en
Inventor
陈炫颖
刘致宁
俞力
顾立宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111250535.0A priority Critical patent/CN113988291B/en
Publication of CN113988291A publication Critical patent/CN113988291A/en
Application granted granted Critical
Publication of CN113988291B publication Critical patent/CN113988291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/387Payment using discounts or coupons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0213Consumer transaction fees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0222During e-commerce, i.e. online transactions

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An embodiment of the present specification provides a training method for a user characterization network, including: inputting the user characteristics of the unbiased sample into a pre-trained first user characterization network to obtain a first user characterization vector, and inputting the user characteristics of the biased sample into a second user characterization network to obtain a second user characterization vector, wherein the unbiased sample and the biased sample are respectively acquired by issuing right and interest shares determined by a random strategy and a non-random strategy to a user; inputting the two obtained user characterization vectors into a discriminator respectively to obtain two corresponding discrimination results; training a discriminator with a function value of a minimized objective function as a target, wherein the function value is positively correlated with a first loss and a second loss, the first loss is determined based on a discrimination result corresponding to an unbiased sample and an unbiased identification, and the second loss is determined based on a discrimination result corresponding to a biased sample and a biased identification; the second user characterization network is trained with the objective of maximizing the function value of the objective function.

Description

Training method and device for user representation network
Technical Field
One or more embodiments of the present disclosure relate to the field of machine learning technologies, and in particular, to a method and an apparatus for training a user representation network, a method and an apparatus for training a user behavior prediction system, and a method and an apparatus for predicting a user behavior.
Background
With the development of economy and the progress of society, more and more service platforms are emerging to provide various services for users so as to meet various requirements of the users in life and work. In order to help a user to find a service meeting the requirement of the user, the service platform adopts a way of issuing rights and interests to attract the user to experience one or more services pushed by the platform. Generally, the total amount of rights and interests issued by the service platform to the users is limited, and in order to enable as many users as possible to actually enjoy the rights and interests, it is necessary to reasonably and effectively determine the rights and interests shares issued to the individual users.
However, the current way of determining the equity shares is difficult to meet the requirements of practical applications. Therefore, a scheme is needed to accurately determine the equity shares issued to the users, so as to meet the expectations of most users as much as possible and effectively improve the user experience.
Disclosure of Invention
One or more embodiments of the present specification describe a method and an apparatus for training a user representation network, a method and an apparatus for training a user behavior prediction system, and a method and an apparatus for predicting user behavior, which introduce a counter learning idea, correct a biased network trained based on biased data using a small amount of unbiased data, and finally obtain a unbiased user behavior prediction system, thereby implementing accurate prediction of user behavior to determine whether to issue a equity to a user and an issued equity share.
According to a first aspect, there is provided a training method for a user characterization network, comprising: inputting the user characteristics of a first unbiased sample in the unbiased sample set into a pre-trained first user characteristic network to obtain a first user characteristic vector; the unbiased sample set is collected by issuing to a user a share of a right-to-benefit determined by a random policy; inputting the user characteristics of the first biased sample in the biased sample set into a second user characteristic network to obtain a second user characteristic vector; the biased sample set is collected by issuing to a user a share of a rights and interests determined by a non-random policy; inputting the first user characteristic vector and the second user characteristic vector into a discriminator respectively to obtain a corresponding first discrimination result and a corresponding second discrimination result; training the discriminator with a function value of a minimized objective function as a target, wherein the function value is positively correlated with a first loss and a second loss, the first loss is determined based on the first discrimination result and an unbiased flag corresponding to an unbiased sample, and the second loss is determined based on the second discrimination result and a biased flag corresponding to a biased sample; training the second user characterization network with a goal of maximizing a function value of the objective function.
In one embodiment, the number of samples of the biased sample set is greater than the number of samples of the unbiased sample set.
In one embodiment, each unbiased sample in the set of unbiased samples has a behavior label indicating whether the corresponding user has honored the corresponding share; the first user characterization network is pre-trained by: inputting the user characteristics in the second unbiased sample in the unbiased sample set into a first user characterization network to obtain a third user characterization vector; inputting the third user characteristic vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result; training the first user characterization network and the first behavior prediction network based on the first prediction result and the behavior labels of the second unbiased samples.
In a specific embodiment, the parameters in the first behavior prediction network include a first parameter matrix and a second parameter matrix; inputting the third user feature vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result, wherein the method comprises the following steps: respectively utilizing the first parameter matrix and the second parameter matrix to carry out linear transformation on the third user characteristic vector to obtain a first transformation value and a second transformation value; processing a sum value between a multiplication result and the second transformation value by using an activation function to obtain the first prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the first transformation value by using a Softplus function and the equity shares.
In another specific embodiment, the first behavior prediction network includes a first interest embedding layer, a first fusion layer, and a first prediction layer; inputting the third user feature vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result, wherein the method comprises the following steps: embedding the rights and interests shares in the first rights and interests embedding layer to obtain rights and interests embedding vectors; in the first fusion layer, carrying out fusion processing on the third user characteristic vector and the rights and interests embedded vector to obtain a fusion vector; and in the first prediction layer, carrying out linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the first prediction result.
In one embodiment, each biased sample of the set of biased samples has a behavior label indicating whether a corresponding user has honored a corresponding share; before entering the user characteristics of the first biased sample in the biased sample set into the second user characterization network, the method further comprises: pre-training the second user characterization network based on the behavioral label.
In a specific embodiment, pre-training the second user characterization network based on the behavior tag includes: inputting the user characteristics of a second biased sample in the biased sample set into a second user characterization network to obtain a fourth user characterization vector; inputting the fourth user characteristic vector and the right share in the second biased sample into a second behavior prediction network to obtain a second prediction result; training the second user characterization network and a second behavior characterization network based on the second prediction result and the behavior label of the second biased sample.
According to a second aspect, there is provided a method of training a user behaviour prediction system comprising a second user characterization network and a second behaviour prediction network, the method comprising: obtaining a second user profile trained in accordance with the method provided in the first aspect; inputting the user characteristics in the third biased sample into the second user characteristic network to obtain a fifth user characteristic vector; the third biased sample is collected by issuing to the user a share of the equity determined by a non-random policy and has a behavior tag indicating whether the corresponding user has verified the equity of the corresponding share; inputting the fifth user feature vector and the equity shares in the third biased sample into the second behavior prediction network to obtain a third prediction result; training the second behavior prediction network based on the third prediction result and the behavior label.
In one embodiment, the parameters in the second behavior prediction network comprise a third parameter matrix and a fourth parameter matrix; inputting the fifth user feature vector and the right share in the third biased sample into a second behavior prediction network to obtain a third prediction result, wherein the third prediction result comprises: respectively utilizing the third parameter matrix and the fourth parameter matrix to carry out linear transformation on the fifth user characteristic vector to obtain a third transformation value and a fourth transformation value; processing a sum value between a multiplication result and the fourth transformation value by using an activation function to obtain the third prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the third transformation value by using a Softplus function and the equity shares.
In another embodiment, the second behavior prediction network includes a second equity embedding layer, a second fusion layer, and a second prediction layer; inputting the fifth user feature vector and the right share in the third biased sample into a second behavior prediction network to obtain a third prediction result, wherein the third prediction result comprises: embedding the rights and interests shares in the second rights and interests embedding layer to obtain rights and interests embedding vectors; in the second fusion layer, carrying out fusion processing on the fifth user characteristic vector and the right interest embedded vector to obtain a fusion vector; and in the second prediction layer, performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the third prediction result.
According to a third aspect, there is provided a method for predicting user behavior, comprising: acquiring a target sample to be predicted, wherein the target sample comprises user characteristics and rights and interests shares of corresponding users; inputting the target sample into a user behavior prediction system trained according to the method provided by the second aspect to obtain a target prediction probability, wherein the target prediction probability indicates the probability that the user performs verification on the equity with the equity shares.
In one embodiment, after obtaining the target prediction probability, the method further comprises: and in the case that the target prediction probability is larger than a preset probability threshold value, issuing the rights and interests with the rights and interests shares to the user.
In another embodiment, obtaining a target sample to be predicted comprises: obtaining M x T samples to be predicted, wherein M represents the number of users, and T represents the total number of categories of the selectable interest shares; and taking each sample in the M x T samples as the target sample respectively. After obtaining the target prediction probability, the method further comprises: and determining the equity share of the equity issued to each user in the M users in a linear programming mode based on the obtained M X T prediction probabilities corresponding to the M X T samples and the total budget of the equity issued to the M users.
According to a fourth aspect, there is provided a training apparatus for a user characterization network, comprising: the first characterization unit is configured to input the user characteristics of the first unbiased sample in the unbiased sample set into a pre-trained first user characterization network to obtain a first user characterization vector; the unbiased sample set is collected by issuing to a user a share of a right-to-benefit determined by a random policy; the second characterization unit is configured to input the user characteristics of the first biased sample in the biased sample set into a second user characterization network to obtain a second user characterization vector; the biased sample set is collected by issuing to a user a share of a rights and interests determined by a non-random policy; the judging unit is configured to input the first user feature vector and the second user feature vector into a discriminator respectively to obtain a corresponding first judging result and a corresponding second judging result; a first training unit configured to train the discriminator with a goal of minimizing a function value of an objective function, the function value being positively correlated with a first loss determined based on the first discrimination result and an unbiased flag corresponding to an unbiased sample and a second loss determined based on the second discrimination result and a biased flag corresponding to a biased sample; a second training unit configured to train the second user characterization network with a goal of maximizing the function value of the objective function.
According to a fifth aspect, there is provided a training apparatus for a user behaviour prediction system comprising a second user characterization network and a second behaviour prediction network, the apparatus comprising: an obtaining unit configured to obtain a second user profile trained with the apparatus of the fourth aspect; the characterization unit is configured to input the user characteristics in the third biased sample into the second user characterization network to obtain a fifth user characterization vector; the third biased sample is collected by issuing to the user a share of the equity determined by a non-random policy and has a behavior tag indicating whether the corresponding user has verified the equity of the corresponding share; the prediction unit is configured to input the fifth user feature vector and the right share in the third biased sample into the second behavior prediction network to obtain a third prediction result; a training unit configured to train the second behavior prediction network based on the third prediction result and the behavior label.
According to a sixth aspect, there is provided an apparatus for predicting user behavior, comprising: the device comprises an acquisition unit, a prediction unit and a prediction unit, wherein the acquisition unit is configured to acquire a target sample to be predicted, and the target sample comprises user characteristics and rights and interests shares of corresponding users; a prediction unit configured to input the target sample into a user behavior prediction system trained by the apparatus provided in the fifth aspect, to obtain a target prediction probability, where the target prediction probability indicates a probability that the user performs cancellation on the equity with the equity shares.
According to a seventh aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second or third aspect.
According to an eighth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor which, when executing the executable code, implements the method of the first or second or third aspect.
By adopting the method and the device provided by the embodiment of the specification, a small amount of unbiased samples are collected by distributing the equity shares to the users by adopting a random strategy, so that the small amount of unbiased samples are utilized to carry out countermeasure learning on the discriminator and the second user characterization network for processing the biased samples, the second user characterization network is corrected, finally, an unbiased user behavior prediction system is obtained, and further, accurate prediction on user behaviors is realized so as to determine whether to distribute the equity shares to the users and the distributed equity shares.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1A illustrates a relationship curve between equity shares plotted based on training samples and statistical probabilities of user underwriting equity;
FIG. 1B illustrates a relationship between equity shares plotted based on a predictive model and a predicted probability of a user's underwriting equity;
FIG. 2 is a schematic diagram illustrating an implementation architecture of a training user characterization network disclosed in an embodiment of the present specification;
FIG. 3 is a flow chart illustrating a method for training a user characterization network disclosed in an embodiment of the present disclosure;
FIG. 4 illustrates a user interface diagram for order payment according to one example;
FIG. 5 illustrates a flow diagram of a method for pre-training a user characterization network, according to one embodiment;
FIG. 6 is a schematic diagram illustrating an implementation architecture of pre-training of a user characterization network disclosed in an embodiment of the present specification;
FIG. 7 is a schematic diagram illustrating data flow in a first behavior prediction network disclosed in an embodiment of the present disclosure;
FIG. 8 is a flow chart illustrating a method for training a user behavior prediction system disclosed in an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of a training implementation architecture of a user behavior prediction system disclosed in an embodiment of the present disclosure;
FIG. 10 is a schematic diagram illustrating data flow in a second behavior prediction network disclosed in an embodiment of the present disclosure;
FIG. 11 is a flow chart illustrating a method for predicting user behavior disclosed in an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a training apparatus for a user characterization network according to an embodiment of the present disclosure;
FIG. 13 is a schematic diagram of a training device of a user behavior prediction system according to an embodiment of the present disclosure;
fig. 14 is a schematic structural diagram of a user behavior prediction apparatus disclosed in an embodiment of the present disclosure.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
As previously described, it is necessary to determine the share of the rights issued to a single user. On the other hand, machine learning techniques have become a current research focus, and thus it is proposed to apply machine learning techniques to the determination of equity shares. In a typical embodiment, personal characteristics and equity shares of a user may be input into a trained machine learning model, a predicted probability of the user to approve the equity shares is obtained, and whether to issue the equity shares to the user is determined based on the predicted probability. It can be understood that the accuracy of the model prediction probability depends on the quantity and quality of the training data, and generally, the more the quantity of training samples is, the higher the quality is, the better the training effect of the model is, and the more accurate the prediction is.
However, in the user interest field, there is often a large deviation or bias in the training data that can be collected actually, because the service platform usually does not randomly determine the interest share when issuing the interest to the user, but uses a predetermined marketing strategy, for example, to issue a high interest to a certain target user group, and further, the collected biased data can be used to draw the curve marked by the triangle symbol shown in fig. 1A, which shows that as the interest share increases, the probability of the user verifying the interest decreases instead, which does not conform to the true psychology of the user. In contrast, by distributing rights to a batch of users with random shares, collecting unbiased data, one can plot the curve identified in squares shown in FIG. 1A, which indicates the true psychology of the user, i.e., the greater the rights shares, the stronger the willingness of the user to cancel the rights, and the higher the probability.
Further, as can be seen from the plotted curve shown in fig. 1B, the statistical trend of the relationship between the share of the right and the user's willingness to verify and sell, which is obtained based on the biased data and the unbiased data, may correspondingly affect the prediction trend of the trained prediction model to the relationship.
Based on the above, a large number of unbiased samples are collected for training of the prediction model by giving random shares to a large number of users. However, large numbers of random shares are impractical to issue due to the limited budget for equity issue in practical application scenarios. Therefore, the inventor proposes that random shares are issued in a small scale, a small amount of unbiased samples are collected, then the thought of counterstudy is utilized, the biased network trained by a large amount of biased data is corrected, and finally the unbiased network with high availability is obtained.
Fig. 2 shows a schematic diagram of an implementation architecture of a training user characterization network disclosed in an embodiment of the present specification. As shown in fig. 2, the second user is characterized by a network hbAs a generator in countermeasure learning, countermeasure with a discriminatorLet the second user characterize the network hbProcessing the biased distribution obtained from the biased samples to gradually approximate the first user characterization network huAnd processing the unbiased distribution obtained by the unbiased sample.
Specifically, unbiased sample xuInputting a pre-trained first user characterization network huTo obtain a first user token vector hu(x) And h isu(x) Input into a discriminator to obtain a first discrimination result d (h)u(x) ); will have a bias sample xbInputting a second subscriber profile hbTo obtain a second user token vector hb(x) And h isb(x) Inputting the result into a discriminator to obtain a second discrimination result d (h)b(x) ); then, the first discrimination result d (h) is usedu(x) And a second discrimination result d (h)b(x) Determine an objective function and characterize the network in two opposite directions, corresponding to the optimization arbiter and the second user, to implement counterlearning, thereby correcting biased distributions.
The following describes implementations of the above inventive concept in conjunction with specific embodiments. Fig. 3 is a schematic flow chart of a training method for a user characterization network, which is disclosed in an embodiment of the present specification, and an execution subject of the method may be any server, apparatus, or device cluster having computing and processing capabilities. As shown in fig. 3, the method comprises the steps of:
first, in step S310, the user feature of the first unbiased sample in the unbiased sample set is input into the pre-trained first user feature network, so as to obtain a first user feature vector.
The unbiased sample set described above is collected by issuing to the user the rights-shares determined by a random policy. It should be understood that the embodiments of the present specification do not limit the form of rights, and for example, the form may be a coupon, such as a 5-yuan coupon, a 10-yuan coupon, etc.; also for example, the prize may be 5% or 10% of the top-up amount, 10G or 20G of cloud disk space, or the like.
For determining the equity shares based on the random strategy, in one implementation, a plurality of selectable equity shares can be preset by a worker, and then, for different users, multiple random selections are performed from the plurality of selectable shares to obtain a plurality of random shares to be issued; in another embodiment, a value interval of the equity shares may be preset by a worker, and then, for different users, a plurality of equity shares falling within the value interval are randomly generated as random shares to be issued.
Further, after the random shares are issued to the users, unbiased samples can be constructed according to the user characteristics of the users to be issued and the verification and cancellation conditions of the users on the random shares. In particular, in one aspect, user characteristics may be categorized into sample characteristics, in one embodiment, user characteristics include basic attribute characteristics, such as gender, age, occupation, frequent residence, and the like; in another embodiment, the user features include portrait features, such as high or low consumer population, and such as electronic fever friends or fitness dawns; in yet another embodiment, the user characteristics include network behavior characteristics such as login frequency, operation preference period, commonly used IP, and the like. In addition, the sample features may include issued equity shares.
On the other hand, the verification and cancellation condition is used to determine a behavior category label (or simply, a behavior label) of the unbiased sample, where the behavior label indicates whether the corresponding user verifies and cancels the issued equity shares. It can be understood that the claim of reimbursement, that is, the use claim, the issued claim is used up, for example, assuming that the claim issued to the user a is a 1-yuan discount coupon for the payment channel D, for intuitive understanding, see the order payment interface shown in fig. 4, further, if the collected reimbursement condition includes that the user a does not pay using the payment channel D, for example, pays using the payment channel a, then the action tag is determined as non-reimbursement (e.g., the tag value is 0); if the collected verification and cancellation conditions comprise that the user A enjoys 1-element benefit through payment through the payment channel D, the behavior tag is determined as verified and cancelled (if the tag value is 1).
The unbiased sample set is described above. For the first user characterization network, the first user characterization network is used for performing unbiased characterization on a user according to an unbiased sample, and may be specifically implemented as a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), or the like; before the step S310, the unbiased sample set may be trained in advance (or simply pre-trained), and the pre-training may be implemented by the following steps S51, S52, and S53 shown in fig. 5, and at the same time, refer to the pre-training implementation architecture of the user characterization network shown in fig. 6.
Specifically, in step S51, the user feature x in the second unbiased sample in the unbiased sample set is first determinediInputting a first user profile network huTo obtain a third user characterization vector hu(xi). It should be understood that the second unbiased sample may be any one of the samples in the unbiased sample set, and specifically, may be obtained based on the sampling manner of the existing training samples.
Next, in step S52, the third user is characterized by a vector hu(xi) And the equity shares t in the second unbiased sampleiInputting a first behavior prediction network to obtain a first prediction result yi
In one embodiment, the first behavior prediction network includes a first parameter matrix wgAnd a second parameter matrix wpIt will be appreciated that these two parameter matrices are trained. Based on this, in the first behavior prediction network, the matrix w is first utilizedgAnd wpCharacterizing the vector h for a third useru(xi) Performing linear transformation to obtain a first transformation value wg*hu(xi) And a second transformed value wp*hu(xi) (ii) a Processing the first transformation value by utilizing a Softplus function, and solving a result obtained by the processing and a right share t in the second unbiased sampleiThe product result between; the sum of the product result and the second transformed value is then processed using an activation function σ (-) to obtain a first predicted result yiWhere the activation function σ () may be a sigmoid function, etc. For example, see the following equation:
yi=σ(Softplus[(wg·hu(xi)]·ti+wp·hu(xi))) (1)
wherein, softplus (x) In (1+ e)x). It should be noted that, considering that the rights and interests are a positive incentive for the user, the influence on the user behavior should be positive, and the higher the rights and interests share is, the stronger the willingness of the user to perform verification and cancellation on the user is; therefore, the design uses Softplus function to guarantee the share tiThe coefficient of influence of (a), i.e. Softplus [ (w) in the above formulag(hu(xi))]The value is not negative, and at the same time, the prediction result y is madeiBetter compliance with monotonicity constraints.
In another embodiment, fig. 7 shows a data flow diagram of a first behavior prediction network disclosed in an embodiment of the present specification, and as shown in fig. 7, a first behavior prediction network 700 includes a first interest embedding layer 701, a first fusion layer 702, and a first prediction layer 703. Thus, at the first equity embedding layer 701, equity shares t are first considerediEmbedding processing is carried out to obtain a rights embedding vector e (t)i). Next, at the first fusion level 702, a vector h is characterized for a third useru(xi) And rights embedding vector e (t)i) Performing fusion processing to obtain a fusion vector ri(ii) a Illustratively, the fusion process may include a stitching process, a bit multiplication process, a dot multiplication process, an addition process, or the like. Then, in the first prediction layer 703, the obtained fusion vector r is processediLinear transformation and/or nonlinear transformation processing is carried out to obtain a first prediction result yi(ii) a It is to be understood that the linear transformation may be implemented using a parameter matrix, bias terms, etc., and the non-linear transformation may be implemented using an activation function.
The first prediction result y is obtained in the aboveiThereafter, in step S53, based on the first prediction result yiAnd behavior labels of the second unbiased sample, training the first user characterization network and the first behavior prediction network. In particular, the behavior prediction result y of the second unbiased sample may be utilizediAnd the behavior label is used for calculating the training loss, and then the first user characteristic network and the first behavior prediction network are trained by adopting a back propagation method based on the training loss, namely training parameters in the two networks are adjusted.
The pre-training of the first user characterization network may be implemented by performing steps S51, S52, and S53 in advance, so that in this step, the user characteristics of the first unbiased sample are input into the pre-trained first user characterization network to obtain the first user characterization vector. It should be understood that the first unbiased sample may be any one of the samples in the unbiased sample set, which may be obtained by sampling the unbiased sample set in the existing sample sampling manner; the first unbiased sample may or may not be the same as the second unbiased sample described above.
Thus, by performing step S310, a first user feature vector corresponding to the first unbiased sample may be obtained. Before, simultaneously with, or after step S310 is performed, step S320 may be performed to input the user characteristics of the first biased sample in the biased sample set into the second user characterization network, so as to obtain a second user characterization vector.
In contrast to the unbiased sample set described above, the biased sample set is collected by issuing to the user the equity shares determined by the non-random policy. It is to be understood that the unbiased sample set is obtained by issuing random shares in a small scale; in actual marketing, the equity shares are determined based on marketing strategies, which are often predetermined and non-random, for example, for a user with a large amount of money for a payment order, a higher equity share is issued, for a user with a small amount of money for a payment order, a lower equity share is issued, since the marketing scale is usually large or has been performed many times in history, a large amount of historical data related to the user's approval equity can be collected, so that a rich biased sample is constructed, and a biased sample set is formed. Thus, the number of samples in the biased sample set is much greater than the number of samples in the unbiased sample set. In addition, the data items in the unbiased sample and the biased sample are consistent and contain the same feature items (such as user features and equity shares) and label items (such as behavior labels).
The biased sample set is introduced above. For the second user characterization network, which is used for characterizing users according to the biased samples, the network structure is preferably designed to be the same as that of the first user characterization network.
In an embodiment, to improve the training effect, before the step S320, a biased sample set may be used to pre-train the second user characterization network, it should be noted that the pre-training of the second user characterization network is similar to the pre-training of the first user characterization network, and the main difference is that the former is performed based on the biased sample set, and the latter is performed based on an unbiased sample set, so that only the pre-training of the second user characterization network is briefly described herein, and specifically, the foregoing description related to the pre-training of the first user characterization network may be referred to.
In a particular embodiment, pre-training the second user characterization network comprises: inputting the user characteristics of a second biased sample in the biased sample set into a second user characterization network to obtain a fourth user characterization vector; inputting the fourth user characteristic vector and the right share in the second biased sample into a second behavior prediction network to obtain a second prediction result; and training the second user characterization network and the second behavior prediction network based on the second prediction result and the behavior label of the second biased sample. It is to be understood that the second behavior prediction network is preferably designed to have the same network structure as the first behavior prediction network. In this manner, a pre-trained second user profile network may be obtained.
Further, in this step, the user characteristics of the first biased sample may be input into the pre-trained second user feature network, so as to obtain a second user feature vector, where the first biased sample may be any one of the samples in the biased sample set.
After the first user feature vector corresponding to the first unbiased sample and the second user feature vector corresponding to the first biased sample are obtained, in step S330, the first user feature vector and the second user feature vector are respectively input to the discriminator to obtain a corresponding first discrimination result and a corresponding second discrimination result. It is to be understood that the discriminator may be used to discriminate whether the user characterization vector input thereto corresponds to an unbiased sample or a biased sample, or, alternatively, is output from the first user characterization network or the second user characterization network. Illustratively, the result of the discrimination output by the discriminator indicates the probability that the vector input thereto corresponds to either an unbiased sample or a biased sample.
After the first and second discrimination results are obtained, in step S340, the discriminator is trained with the function value of the minimized objective function as a target; in step S350, the second user characterization network is trained with the objective of maximizing the function value of the objective function.
Specifically, the first loss may be determined based on the first discrimination result and an unbiased flag corresponding to the unbiased sample, and the second loss may be determined based on the second discrimination result and a biased flag corresponding to the biased sample, so that the function value positively correlated to the first loss and the second loss may be determined based on the previously designed objective function. It is to be understood that the unbiased flag indicates that the corresponding sample belongs to an unbiased sample, and the biased flag indicates that the corresponding sample belongs to a biased sample. Illustratively, the objective function is designed in the form:
Figure BDA0003322305680000101
in the above formula, x represents the user feature in the sample, duRepresenting an unbiased sample set, DbRepresenting a biased sample set; h isu(x) Representing said first user token vector; d (h)u(x) Denotes the first discrimination result, and indicates hu(x) Probability of originating from unbiased samples; log d (h)u(x) Represents the first loss; h isb(x) Representing said second user characterization vector; d (h)b(x) Denotes the above second discrimination result, indicates hb(x) Probability of originating from unbiased samples; log (1-d (h)b(x) ) represents the second loss described above.
In this way, the countermeasure training between the discriminator and the second user characterization network can be realized, and finally, the distribution of the user characterization vectors generated by the second user characterization network is similar to the data distribution output by the first user characterization network, so that the discriminator has difficulty in distinguishing which of the two user characterization networks the vectors input into the discriminator belong to.
In summary, with the training method for the user characterization network disclosed in this specification, a small amount of unbiased samples are collected by using a random strategy to issue equity shares to users, so that the small amount of unbiased samples are used to perform counterstudy on the discriminator and the second user characterization network for processing biased samples, and the second user characterization network is corrected, so that the data distribution of the user characterization vectors output by the second user characterization network approximates the output distribution of the first user characterization network.
According to another embodiment, a biased sample set may be adopted, and based on the second user characteristic network after the counterstudy training is adopted, the second behavior prediction network is trained, so as to obtain a trained user behavior prediction system, which is used for implementing accurate prediction of user behavior.
Fig. 8 is a flowchart illustrating a training method of the user behavior prediction system disclosed in the embodiment of the present disclosure, wherein the training method includes a second user characterization network and a second behavior prediction network. The execution subject of the method can be any device, server or equipment cluster with computing and processing capability. As shown in fig. 8, the training method includes the following steps:
in step S810, a second user profile network trained against the method shown in fig. 3 is obtained. Step S820, inputting the user characteristics in a third biased sample in the biased sample set into the second user characterization network to obtain a fifth user characterization vector; step S830, inputting the fifth user token vector and the equity shares in the third biased sample into a second behavior prediction network to obtain a third prediction result; step 840, training a second behavior prediction network based on the third prediction result and the behavior label of the third biased sample.
For illustrative purposes, the above steps are described below with reference to the training implementation architecture diagram shown in fig. 9:
first, in step S810, a second user profile network h trained against the method shown in fig. 3 is obtainedb. It is to be understood that the second subscriber profile obtained characterizes the network hbMay be obtained after one or more rounds of iterative training based on the method shown in fig. 3.
Next, in step S820, the user characteristic x of the third biased sample in the biased sample set is determinedjInputting a second subscriber profile hbTo obtain a fifth user token vector hb(xj)。
Then, in step S830, a fifth user characterization vector hb(xj) And the equity shares t in the third biased samplejInputting a second behavior prediction network to obtain a third prediction result yj
In one embodiment, the second behavior prediction network includes a third parameter matrix wgAnd a fourth parameter matrix wpIt will be appreciated that these two parameter matrices are trained. Based on this, in the second behavior prediction network, the matrix w is first utilized respectivelygAnd wpCharacterizing the vector h for a fifth userb(xj) Performing linear transformation to obtain a third transformation value wg*hb(xj) And a fourth transformed value wp*hb(xj) (ii) a Processing the third transformation value by utilizing a Softplus function, and solving a result obtained by the processing and a right share t in a third biased samplejThe product result between; the sum of the product result and the fourth transformed value is then processed by an activation function σ (-) to obtain a third predicted result yjWhere the activation function σ () may be a sigmoid function, etc. For example, see the following equation:
yj=σ(Softplus[(wg(hb(xj))]·tj+wp·hb(xj))) (3)
wherein, softplus (x) In (1+ e)x)。
In another embodiment, fig. 10 shows a data flow diagram of a second behavior prediction network disclosed in an embodiment of the present specification, and as shown in fig. 10, the second behavior prediction network 1000 includes a second interest embedding layer 1001, a second fusion layer 1002, and a second prediction layer 1003. Thus, at the second rights embedding layer 1001, the rights shares t are first combinedjEmbedding processing is carried out to obtain a rights embedding vector e (t)j). Followed byAt the second fusion level 1002, a vector h is characterized for a fifth userb(xj) And rights embedding vector e (t)j) Performing fusion processing to obtain a fusion vector rj(ii) a Illustratively, the fusion process may include a stitching process, a bit multiplication process, a dot multiplication process, an addition process, or the like. Then, in the second prediction layer 1003, the obtained fusion vector r is subjected tojLinear transformation and/or nonlinear transformation processing is carried out to obtain a third prediction result yj
In the above, the third prediction result y is obtainedjThereafter, in step S840, based on the third prediction result yjAnd the behavior labels of the third biased sample train the second behavior prediction network. Specifically, the behavior prediction result y of the third biased sample may be utilizedjAnd a behavior label, calculating the training loss, and training a second behavior prediction network by adopting a back propagation method based on the training loss. It should be noted that in a typical scenario, the parameters of the second behavior prediction network are adjusted only by using the determined training loss, and the parameters of the second user characterization network are not adjusted, because the obtained second user characterization network has been corrected by using unbiased data and is not changed.
In summary, with the training method of the user behavior prediction system disclosed in the embodiment of the present specification, the second behavior characterization network is further trained by using the corrected second user characterization network and the biased sample set, so that the trained user behavior prediction system is constructed by using the corrected second user characterization network and the further trained second behavior characterization network, and further, the accurate prediction of the user behavior is realized.
According to another aspect of the embodiment, based on the trained user behavior prediction system, the present specification further discloses a user behavior prediction method. Fig. 11 is a flowchart illustrating a method for predicting user behavior disclosed in an embodiment of the present disclosure, where an execution subject of the method may be any device, server, or equipment cluster having computing and processing capabilities. As shown in fig. 11, the method comprises the steps of:
step S1110, obtaining a target sample to be predicted, wherein the target sample comprises user characteristics and target right share of a corresponding target user; step S1120, inputting the target sample into the user behavior prediction system trained according to the method shown in fig. 8, to obtain a target prediction probability, which indicates a probability that the target user performs a verification of the equity with the target equity share.
With respect to the above steps, in an embodiment, after the step S1120, the method further includes: judging whether the target prediction probability is greater than a preset probability threshold (such as 0.7 or 0.8); and if the right is judged to be larger than the target right share, the right with the target right share is issued to the target user, otherwise, the right is not issued.
In another embodiment, the step S1110 includes: obtaining M x T samples to be predicted, wherein M represents the number of users, and T represents the total number of categories of the selectable interest shares; taking each of the M × T samples as the target sample; based on this, after the step S1120, the method further includes: and constructing a linear programming problem based on the obtained M X T prediction probabilities corresponding to the M X T samples and the total budget for issuing the rights and interests to the M users, and solving the problem that the probability and the value of the issued rights and interests of the M users are maximized. It should be understood that the construction and solution of the linear programming problem can be implemented in a conventional mathematical manner, and one of the manners is described below for exemplary illustration, and specifically, the linear programming problem can be constructed in the form of:
Figure BDA0003322305680000121
Figure BDA0003322305680000122
in the above formula, M represents the total number of users; k is a radical ofiThe number of issued rights and interests of the ith user is represented and can be preset manually; t represents the total number of categories of the optional equity shares; f. ofijRepresenting the probability of the ith user for verifying and canceling the jth class equity share, and outputting the probability by a user prediction system;aijindicating whether to issue the jth class of rights and interests shares to i users, indicating to take 1 when issuing, and indicating to take 0 when not issuing; t is tjRepresenting a jth class of equity shares; b represents the individual equity share budget.
Solving equation (4) can be the following optimal solution:
Figure BDA0003322305680000131
in the above formula, λ is obtained by converting the KKT (Karush-Kuhn-Tucker) condition into a dual problem and solving the dual problem.
In this manner, the equity shares issued to each of the M users may be determined.
It should be noted that, reference may also be made to the description of steps S1110 and S1120 in the foregoing embodiments.
In summary, in the prediction method for user behavior disclosed in the embodiments of the present specification, based on a trained user behavior prediction system, the user behavior is accurately predicted, so as to determine whether to issue a right and a specific share to the user, so as to meet psychological expectations of most users as much as possible, and effectively improve user experience.
Corresponding to the training method and the prediction method, the embodiment of the specification also discloses a training device and a prediction device.
Fig. 12 is a schematic structural diagram of a training apparatus for a user characterization network according to an embodiment of the present disclosure, where the apparatus may be implemented as any server, platform, or device cluster with computing and processing capabilities. As shown in fig. 12, the apparatus 1200 includes the following units:
a first characterization unit 1210 configured to input a user feature of a first unbiased sample in the unbiased sample set into a pre-trained first user characterization network, so as to obtain a first user characterization vector; the unbiased sample set is collected by issuing to the user the rights-shares determined by a random policy. A second characterization unit 1220, configured to input the user characteristics of the first biased sample in the biased sample set into a second user characterization network, so as to obtain a second user characterization vector; the biased sample set is collected by issuing to the user the equity shares determined by a non-random policy. A determining unit 1230, configured to input the first user token vector and the second user token vector into a determiner, respectively, to obtain a corresponding first determination result and a corresponding second determination result. A first training unit 1240 configured to train the discriminator with a goal of minimizing a function value of an objective function, the function value being positively correlated with a first loss determined based on the first discrimination result and an unbiased flag corresponding to an unbiased sample and a second loss determined based on the second discrimination result and a biased flag corresponding to a biased sample. A second training unit 1250 configured to train the second user characterization network with the goal of maximizing the function value of the objective function.
In one embodiment, the number of samples of the biased sample set is greater than the number of samples of the unbiased sample set.
In one embodiment, each unbiased sample in the set of unbiased samples has a behavior label indicating whether the corresponding user has honored the corresponding share; the first user profile network is pre-trained by a first pre-training unit 1260, the first pre-training unit 1260 comprising: a characterization subunit 1261, configured to input the user feature in the second unbiased sample in the unbiased sample set into the first user characterization network, so as to obtain a third user characterization vector; a prediction subunit 1262, configured to input the third user feature vector and the equity shares in the second unbiased sample into the first behavior prediction network, so as to obtain a first prediction result; a training subunit 1263 configured to train the first user characterization network and the first behavior prediction network based on the first prediction result and the behavior labels of the second unbiased samples.
In a specific embodiment, the parameters in the first behavior prediction network include a first parameter matrix and a second parameter matrix; the predictor unit 1262 is specifically configured to: respectively utilizing the first parameter matrix and the second parameter matrix to carry out linear transformation on the third user characteristic vector to obtain a first transformation value and a second transformation value; processing a sum value between a multiplication result and the second transformation value by using an activation function to obtain the first prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the first transformation value by using a Softplus function and the equity shares.
In another specific embodiment, the first behavior prediction network includes a first interest embedding layer, a first fusion layer, and a first prediction layer; the predictor unit 1262 is specifically configured to: embedding the rights and interests shares in the first rights and interests embedding layer to obtain rights and interests embedding vectors; in the first fusion layer, carrying out fusion processing on the third user characteristic vector and the rights and interests embedded vector to obtain a fusion vector; and in the first prediction layer, carrying out linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the first prediction result.
In one embodiment, each biased sample of the set of biased samples has a behavior label indicating whether a corresponding user has honored a corresponding share; the device further comprises: a second pre-training unit 1270 configured to pre-train said second user profile network based on said behavior labels.
In a specific embodiment, the second pre-training unit 1270 is specifically configured to: inputting the user characteristics of a second biased sample in the biased sample set into a second user characterization network to obtain a fourth user characterization vector; inputting the fourth user characteristic vector and the right share in the second biased sample into a second behavior prediction network to obtain a second prediction result; training the second user characterization network and a second behavior characterization network based on the second prediction result and the behavior label of the second biased sample.
Further, in one example, the parameters in the second behavior prediction network include a third parameter matrix and a fourth parameter matrix; wherein the second pre-training unit 1270 is further configured to: respectively utilizing the third parameter matrix and the fourth parameter matrix to carry out linear transformation on the fourth user characteristic vector to obtain a fifth transformation value and a sixth transformation value; processing a sum value between a multiplication result and the sixth transformation value by using an activation function to obtain the second prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the fifth transformation value by using a Softplus function and a right share of a second biased sample.
In another example, the second behavior prediction network includes a second equity embedding layer, a second fusion layer, and a second prediction layer; wherein the second pre-training unit 1270 is further configured to: embedding the rights and interests shares in the second rights and interests embedding layer to obtain rights and interests embedding vectors; in the second fusion layer, carrying out fusion processing on the fifth user characteristic vector and the right interest embedded vector to obtain a fusion vector; and in the second prediction layer, performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain a second prediction result.
Fig. 13 is a schematic structural diagram of an exercise apparatus of a user behavior prediction system disclosed in an embodiment of the present specification, where the user behavior prediction system includes a second user characterization network and a second behavior prediction network, and the apparatus may be implemented as any server, platform, or device cluster having computing and processing capabilities. As shown in fig. 13, the apparatus 1300 includes the following units:
an obtaining unit 1310 configured to obtain a second user profile network trained with the apparatus shown in fig. 12. A characterization unit 1320, configured to input the user feature in the third biased sample into the second user characterization network, so as to obtain a fifth user characterization vector; the third biased sample is collected by issuing to the user a share of the equity determined by a non-random policy, and has a behavior tag indicating whether the corresponding user has credited the equity of the corresponding share. A predicting unit 1330 configured to input the fifth user token vector and the equity shares in the third biased sample into the second behavior prediction network to obtain a third prediction result. A training unit 1340 configured to train the second behavior prediction network based on the third prediction result and the behavior label.
In one embodiment, the parameters in the second behavior prediction network comprise a third parameter matrix and a fourth parameter matrix; the prediction unit 1330 is specifically configured to: respectively utilizing the third parameter matrix and the fourth parameter matrix to carry out linear transformation on the fifth user characteristic vector to obtain a third transformation value and a fourth transformation value; processing a sum value between a multiplication result and the fourth transformation value by using an activation function to obtain the third prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the third transformation value by using a Softplus function and the equity shares.
In another embodiment, the second behavior prediction network includes a second equity embedding layer, a second fusion layer, and a second prediction layer; the prediction unit 1330 is specifically configured to: embedding the rights and interests shares in the second rights and interests embedding layer to obtain rights and interests embedding vectors; in the second fusion layer, carrying out fusion processing on the fifth user characteristic vector and the right interest embedded vector to obtain a fusion vector; and in the second prediction layer, performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the third prediction result.
Fig. 14 is a schematic structural diagram illustrating a device for predicting user behavior according to an embodiment of the present disclosure, where the device may be implemented as any server, platform, or device cluster having computing and processing capabilities. As shown in fig. 14, the apparatus 1400 includes the following units:
an obtaining unit 1410 configured to obtain a target sample to be predicted, where the target sample includes user characteristics and equity shares of a corresponding user; a prediction unit 1420 configured to input the target sample to a user behavior prediction system trained with the apparatus of claim 21, resulting in a target prediction probability indicating a probability of the user verifying the equity shares with the equity shares.
In one embodiment, the apparatus further comprises: a processing unit 1430 configured to issue a right with the right share to the user if the target prediction probability is greater than a preset probability threshold.
In one embodiment, the obtaining unit 1410 is specifically configured to: obtaining M x T samples to be predicted, wherein M represents the number of users, and T represents the total number of categories of the selectable interest shares; taking each of the M x T samples as the target sample; the device further comprises: and a linear programming unit 1440 configured to determine, in a linear programming manner, the equity shares for granting equity to each of the M users based on the obtained M × T prediction probabilities corresponding to the M × T samples and the total budget for granting equity to the M users.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3, 5, 8 or 11.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 3, 5, 8 or 11.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (19)

1. A method of training a user characterization network, comprising:
inputting the user characteristics of a first unbiased sample in the unbiased sample set into a pre-trained first user characteristic network to obtain a first user characteristic vector; the unbiased sample set is collected by issuing to a user a share of a right-to-benefit determined by a random policy;
inputting the user characteristics of the first biased sample in the biased sample set into a second user characteristic network to obtain a second user characteristic vector; the biased sample set is collected by issuing to a user a share of a rights and interests determined by a non-random policy;
inputting the first user characteristic vector and the second user characteristic vector into a discriminator respectively to obtain a corresponding first discrimination result and a corresponding second discrimination result;
training the discriminator with a function value of a minimized objective function as a target, wherein the function value is positively correlated with a first loss and a second loss, the first loss is determined based on the first discrimination result and an unbiased flag corresponding to an unbiased sample, and the second loss is determined based on the second discrimination result and a biased flag corresponding to a biased sample;
training the second user characterization network with a goal of maximizing a function value of the objective function.
2. The method of claim 1, wherein the number of samples of the biased sample set is greater than the number of samples of the unbiased sample set.
3. The method of claim 1, wherein each unbiased sample in the set of unbiased samples has a behavior label indicating whether a corresponding user has honored a corresponding share; the first user characterization network is pre-trained by:
inputting the user characteristics in the second unbiased sample in the unbiased sample set into a first user characterization network to obtain a third user characterization vector;
inputting the third user characteristic vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result;
training the first user characterization network and the first behavior prediction network based on the first prediction result and the behavior labels of the second unbiased samples.
4. The method of claim 3, wherein the parameters in the first behavior prediction network comprise a first parameter matrix and a second parameter matrix; inputting the third user feature vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result, wherein the method comprises the following steps:
respectively utilizing the first parameter matrix and the second parameter matrix to carry out linear transformation on the third user characteristic vector to obtain a first transformation value and a second transformation value;
processing a sum value between a multiplication result and the second transformation value by using an activation function to obtain the first prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the first transformation value by using a Softplus function and the equity shares.
5. The method of claim 3, wherein the first behavior prediction network comprises a first interest embedding layer, a first fusion layer, and a first prediction layer; inputting the third user feature vector and the right share in the second unbiased sample into a first behavior prediction network to obtain a first prediction result, wherein the method comprises the following steps:
embedding the rights and interests shares in the first rights and interests embedding layer to obtain rights and interests embedding vectors;
in the first fusion layer, carrying out fusion processing on the third user characteristic vector and the rights and interests embedded vector to obtain a fusion vector;
and in the first prediction layer, carrying out linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the first prediction result.
6. The method of claim 1, wherein each biased sample of the set of biased samples has a behavior label indicating whether a corresponding user has honored a corresponding share; before entering the user characteristics of the first biased sample in the biased sample set into the second user characterization network, the method further comprises:
pre-training the second user characterization network based on the behavioral label.
7. The method of claim 6, wherein pre-training the second user characterization network based on the behavior tag comprises:
inputting the user characteristics of a second biased sample in the biased sample set into a second user characterization network to obtain a fourth user characterization vector;
inputting the fourth user characteristic vector and the right share in the second biased sample into a second behavior prediction network to obtain a second prediction result;
training the second user characterization network and a second behavior characterization network based on the second prediction result and the behavior label of the second biased sample.
8. A method of training a user behavior prediction system comprising a second user characterization network and a second behavior prediction network, the method comprising:
obtaining a second user characterization network trained in accordance with the method of claim 1;
inputting the user characteristics in the third biased sample into the second user characteristic network to obtain a fifth user characteristic vector; the third biased sample is collected by issuing to the user a share of the equity determined by a non-random policy and has a behavior tag indicating whether the corresponding user has verified the equity of the corresponding share;
inputting the fifth user feature vector and the equity shares in the third biased sample into the second behavior prediction network to obtain a third prediction result;
training the second behavior prediction network based on the third prediction result and the behavior label.
9. The method of claim 8, wherein the parameters in the second behavior prediction network comprise a third parameter matrix and a fourth parameter matrix; inputting the fifth user feature vector and the right share in the third biased sample into a second behavior prediction network to obtain a third prediction result, wherein the third prediction result comprises:
respectively utilizing the third parameter matrix and the fourth parameter matrix to carry out linear transformation on the fifth user characteristic vector to obtain a third transformation value and a fourth transformation value;
processing a sum value between a multiplication result and the fourth transformation value by using an activation function to obtain the third prediction result, wherein an operation factor of the multiplication result comprises a result obtained by processing the third transformation value by using a Softplus function and the equity shares.
10. The method of claim 8, wherein the second behavior prediction network comprises a second equity embedding layer, a second fusion layer, and a second prediction layer; inputting the fifth user feature vector and the right share in the third biased sample into a second behavior prediction network to obtain a third prediction result, wherein the third prediction result comprises:
embedding the rights and interests shares in the second rights and interests embedding layer to obtain rights and interests embedding vectors;
in the second fusion layer, carrying out fusion processing on the fifth user characteristic vector and the right interest embedded vector to obtain a fusion vector;
and in the second prediction layer, performing linear transformation and/or nonlinear transformation processing on the fusion vector to obtain the third prediction result.
11. A method of predicting user behavior, comprising:
acquiring a target sample to be predicted, wherein the target sample comprises user characteristics and rights and interests of corresponding users;
inputting the target sample into a user behavior prediction system trained according to the method of claim 8, resulting in a target prediction probability indicating a probability of the user verifying the equity shares with the equity shares.
12. The method of claim 11, wherein after deriving the target prediction probability, the method further comprises:
and in the case that the target prediction probability is larger than a preset probability threshold value, issuing the rights and interests with the rights and interests shares to the user.
13. The method of claim 11, wherein obtaining a target sample to be predicted comprises:
obtaining M x T samples to be predicted, wherein M represents the number of users, and T represents the total number of categories of the selectable interest shares;
taking each of the M x T samples as the target sample;
after obtaining the target prediction probability, the method further comprises:
and determining the equity share of the equity issued to each user in the M users in a linear programming mode based on the obtained M X T prediction probabilities corresponding to the M X T samples and the total budget of the equity issued to the M users.
14. A training apparatus for a user characterization network, comprising:
the first characterization unit is configured to input the user characteristics of the first unbiased sample in the unbiased sample set into a pre-trained first user characterization network to obtain a first user characterization vector; the unbiased sample set is collected by issuing to a user a share of a right-to-benefit determined by a random policy;
the second characterization unit is configured to input the user characteristics of the first biased sample in the biased sample set into a second user characterization network to obtain a second user characterization vector; the biased sample set is collected by issuing to a user a share of a rights and interests determined by a non-random policy;
the judging unit is configured to input the first user feature vector and the second user feature vector into a discriminator respectively to obtain a corresponding first judging result and a corresponding second judging result;
a first training unit configured to train the discriminator with a goal of minimizing a function value of an objective function, the function value being positively correlated with a first loss determined based on the first discrimination result and an unbiased flag corresponding to an unbiased sample and a second loss determined based on the second discrimination result and a biased flag corresponding to a biased sample;
a second training unit configured to train the second user characterization network with a goal of maximizing the function value of the objective function.
15. A training apparatus for a user behavior prediction system comprising a second user characterization network and a second behavior prediction network, the apparatus comprising:
an obtaining unit configured to obtain a second user characterization network trained with the apparatus of claim 14;
the characterization unit is configured to input the user characteristics in the third biased sample into the second user characterization network to obtain a fifth user characterization vector; the third biased sample is collected by issuing to the user a share of the equity determined by a non-random policy and has a behavior tag indicating whether the corresponding user has verified the equity of the corresponding share;
the prediction unit is configured to input the fifth user feature vector and the right share in the third biased sample into the second behavior prediction network to obtain a third prediction result;
a training unit configured to train the second behavior prediction network based on the third prediction result and the behavior label.
16. An apparatus for predicting user behavior, comprising:
the device comprises an acquisition unit, a prediction unit and a prediction unit, wherein the acquisition unit is configured to acquire a target sample to be predicted, and the target sample comprises user characteristics and rights and interests shares of corresponding users;
a prediction unit configured to input the target sample into a user behavior prediction system trained with the apparatus of claim 15, resulting in a target prediction probability indicating a probability of the user verifying the equity shares with the equity shares.
17. The apparatus according to claim 16, wherein the obtaining unit is specifically configured to:
obtaining M x T samples to be predicted, wherein M represents the number of users, and T represents the total number of categories of the selectable interest shares;
taking each of the M x T samples as the target sample;
the device further comprises:
and the linear programming unit is configured to determine the equity shares of the equity issued to each user in the M users in a linear programming mode based on the obtained M x T prediction probabilities corresponding to the M x T samples and the total budget of the equity issued to the M users.
18. A computer-readable storage medium, on which a computer program is stored, wherein the computer program causes a computer to carry out the method of any one of claims 1-13 when the computer program is carried out in the computer.
19. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-13.
CN202111250535.0A 2021-10-26 2021-10-26 Training method and device for user characterization network Active CN113988291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111250535.0A CN113988291B (en) 2021-10-26 2021-10-26 Training method and device for user characterization network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111250535.0A CN113988291B (en) 2021-10-26 2021-10-26 Training method and device for user characterization network

Publications (2)

Publication Number Publication Date
CN113988291A true CN113988291A (en) 2022-01-28
CN113988291B CN113988291B (en) 2024-06-04

Family

ID=79741898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111250535.0A Active CN113988291B (en) 2021-10-26 2021-10-26 Training method and device for user characterization network

Country Status (1)

Country Link
CN (1) CN113988291B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909878A (en) * 2019-12-02 2020-03-24 支付宝(杭州)信息技术有限公司 Training method and device of neural network model for estimating resource usage share
CN111445007A (en) * 2020-03-03 2020-07-24 平安科技(深圳)有限公司 Training method and system for resisting generation of neural network
CN111523314A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model confrontation training and named entity recognition method and device
US10796104B1 (en) * 2019-07-03 2020-10-06 Clinc, Inc. Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system
US20200372406A1 (en) * 2019-05-22 2020-11-26 Oracle International Corporation Enforcing Fairness on Unlabeled Data to Improve Modeling Performance
CN112115963A (en) * 2020-07-30 2020-12-22 浙江工业大学 Method for generating unbiased deep learning model based on transfer learning
EP3767536A1 (en) * 2019-07-17 2021-01-20 Naver Corporation Latent code for unsupervised domain adaptation
CN113508378A (en) * 2019-10-31 2021-10-15 华为技术有限公司 Recommendation model training method, recommendation device and computer readable medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200372406A1 (en) * 2019-05-22 2020-11-26 Oracle International Corporation Enforcing Fairness on Unlabeled Data to Improve Modeling Performance
US10796104B1 (en) * 2019-07-03 2020-10-06 Clinc, Inc. Systems and methods for constructing an artificially diverse corpus of training data samples for training a contextually-biased model for a machine learning-based dialogue system
EP3767536A1 (en) * 2019-07-17 2021-01-20 Naver Corporation Latent code for unsupervised domain adaptation
CN113508378A (en) * 2019-10-31 2021-10-15 华为技术有限公司 Recommendation model training method, recommendation device and computer readable medium
CN110909878A (en) * 2019-12-02 2020-03-24 支付宝(杭州)信息技术有限公司 Training method and device of neural network model for estimating resource usage share
CN111445007A (en) * 2020-03-03 2020-07-24 平安科技(深圳)有限公司 Training method and system for resisting generation of neural network
CN111523314A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model confrontation training and named entity recognition method and device
CN112115963A (en) * 2020-07-30 2020-12-22 浙江工业大学 Method for generating unbiased deep learning model based on transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩艺: "基于倾斜熵正则化项的半监督逻辑回归实现拒绝推断", 万方, 25 May 2021 (2021-05-25) *

Also Published As

Publication number Publication date
CN113988291B (en) 2024-06-04

Similar Documents

Publication Publication Date Title
US11868941B2 (en) Task-level answer confidence estimation for worker assessment
US10783457B2 (en) Method for determining risk preference of user, information recommendation method, and apparatus
CN109102393B (en) Method and device for training and using relational network embedded model
US10846620B2 (en) Machine learning-based patent quality metric
CN112381428B (en) Service distribution method, device, equipment and storage medium based on reinforcement learning
Wang et al. Cost-effective quality assurance in crowd labeling
US20190035015A1 (en) Method and apparatus for obtaining a stable credit score
US20140108103A1 (en) Systems and methods to control work progress for content transformation based on natural language processing and/or machine learning
JP6819355B2 (en) Recommendation generation
CN112801670B (en) Risk assessment method and device for payment operation
WO2022198983A1 (en) Conversation recommendation method and apparatus, electronic device, and storage medium
CN111061948B (en) User tag recommendation method and device, computer equipment and storage medium
CN115034886A (en) Default risk prediction method and device
CN111598632B (en) Method and device for determining equity shares and equity share sequence
CN116910373A (en) House source recommendation method and device, electronic equipment and storage medium
CN110717537A (en) Method and device for training user classification model and executing user classification prediction
CN113988291A (en) Training method and device for user representation network
CN116128339A (en) Client credit evaluation method and device, storage medium and electronic equipment
CN110795232B (en) Data processing method, device, computer readable storage medium and computer equipment
CN114240605A (en) Loan calculation method and device, computer equipment and storage medium
CN112862602B (en) User request determining method, storage medium and electronic device
Tang Human-Centered Machine Learning: Algorithm Design and Human Behavior
CN111784358A (en) Identity verification method and device based on user privacy protection
Tian et al. DeRDaVa: Deletion-Robust Data Valuation for Machine Learning
CN112446763A (en) Service recommendation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant