CN116611731A - Scoring model training method, user pushing method and device - Google Patents
Scoring model training method, user pushing method and device Download PDFInfo
- Publication number
- CN116611731A CN116611731A CN202310572391.3A CN202310572391A CN116611731A CN 116611731 A CN116611731 A CN 116611731A CN 202310572391 A CN202310572391 A CN 202310572391A CN 116611731 A CN116611731 A CN 116611731A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- characterization
- loss
- continuous distribution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012549 training Methods 0.000 title abstract description 21
- 238000009826 distribution Methods 0.000 claims abstract description 162
- 230000006870 function Effects 0.000 claims abstract description 39
- 238000005070 sampling Methods 0.000 claims abstract description 38
- 238000012512 characterization method Methods 0.000 claims description 175
- 239000011159 matrix material Substances 0.000 claims description 49
- 230000002776 aggregation Effects 0.000 claims description 25
- 238000004220 aggregation Methods 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 claims description 22
- 230000004913 activation Effects 0.000 claims description 14
- 238000010276 construction Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 abstract description 4
- 239000013598 vector Substances 0.000 description 54
- 230000008569 process Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the specification provides a scoring model training method, a user pushing method and a device. In the training method of the scoring model, a relation network of interaction between a user and a pushing object is constructed, node representation continuous distribution of the user and the pushing object is estimated through a variogram reasoning network, the node representation continuous distribution is sampled, a first sampling representation and a second sampling representation serving as comparison targets are obtained, and a comparison loss function of a node level is determined based on the comparison targets. Meanwhile, a relation network between the continuous distribution reconstruction user and the pushing object is characterized by utilizing the nodes, so that a reconstruction loss function is obtained. And clustering the user and the pushing object by using the node representation continuous distribution, and calculating a clustering-perceived contrast loss function based on a clustering result. And combining a plurality of loss functions to perform multi-task learning, and updating the model until convergence. And calculating the score of the user on the pushing object by using the trained scoring model, and pushing the user based on the score.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a score model training method, a user pushing method, and a device.
Background
With the development of society and the advancement of technology, more and more service platforms are presented to provide various services to users. Many service platforms are capable of providing convenient online services to users through servers and clients. For example, the electronic commerce platform can provide various commodity information for users to browse, select and purchase; the content platform can provide users with favorite electronic books, articles, music, videos, and the like. In order to provide richer services, the service platform can push richer push objects for the user according to the historical behavior record of the user on the premise of guaranteeing the privacy and the safety of the user data after being authorized by the user, so that the user can select the push objects, and convenience is provided for the user to a certain extent.
At present, an improved scheme is desired, and pushing can be provided for users more accurately and reasonably.
Disclosure of Invention
One or more embodiments of the present disclosure describe a scoring model training method, a user pushing method and a device, which can provide pushing for a user more accurately and more reasonably. The specific technical scheme is as follows.
In a first aspect, an embodiment provides a training method of a scoring model, where the scoring model is used for predicting scores of a plurality of users on a plurality of to-be-pushed objects respectively; the scoring model is trained based on a relationship network comprising a plurality of nodes, wherein the plurality of nodes comprise nodes representing users and nodes representing push objects, and edges representing connection relations among the nodes; the parameters to be learned of the scoring model comprise node characterization; the method comprises the following steps:
Determining, by the graph neural network, an aggregate representation of the nodes based on the node representations of the plurality of nodes and an original adjacency matrix of the relationship network; wherein, the aggregation characterization aggregates neighbor user features and/or neighbor push object features;
based on the aggregation characterization and random parameters of the nodes, constructing node characterization continuous distribution, and sampling the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets; wherein the node characterization succession of distributions comprises: the node characterization continuous distribution comprising the user features and the node characterization continuous distribution comprising the push object features;
determining a first loss based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, determining a predicted loss based on the first loss;
updating the node representations of the plurality of nodes in a direction to reduce the predictive loss.
In one embodiment, the graph neural network is implemented using a graph roll-up network; the graph convolution network comprises a plurality of convolution layers; the step of determining an aggregate representation of the node comprises:
Any one convolution layer outputs an intermediate representation of any one node in the following manner: determining the middle representation of the node in the convolution layer based on the middle representation of the neighbor node of the node output by the previous convolution layer;
for any one node, determining the aggregate characterization of the node based on the intermediate characterization of the node output by a plurality of convolution layers.
In one embodiment, the node representation continuous distribution conforms to a gaussian distribution;
the step of constructing the node representation continuous distribution comprises the following steps:
characterizing the aggregation as a mean; determining a corresponding variance based on the mean;
and constructing a node characterization continuous distribution conforming to Gaussian distribution based on the mean value, the variance and the random parameter.
In one embodiment, the step of constructing a gaussian-compliant node characterizes a continuous distribution, comprising:
constructing a first term of node characterization continuous distribution by taking the variance as a coefficient of the random parameter;
and constructing the node characterization continuous distribution based on the sum of the first term and the mean value.
In one embodiment, the random parameter is random gaussian noise with a mean of 0 and a variance of 1.
In one embodiment, the step of determining a predicted loss based on the first loss comprises:
determining a second loss based on differences between the mean and the variance and a preset mean and variance of a gaussian distribution, respectively;
a predicted loss is determined based on the first loss and the second loss.
In one embodiment, after constructing the node characterization continuous distribution, the method further comprises:
predicting the probability of interconnection between a plurality of nodes based on the node characterization continuous distribution;
determining a reconstructed adjacency matrix of the relational network based on the probabilities;
the step of determining a predicted loss based on the first loss comprises:
determining a third loss based on a difference between the reconstructed adjacency matrix and the original adjacency matrix;
a predicted loss is determined based on the first loss and the third loss.
In one embodiment, the step of predicting the probability of interconnection between the plurality of nodes includes:
and aiming at any first node and second node, taking the similarity between the node representation continuous distribution of the first node and the node representation continuous distribution of the second node as an input parameter of an activation function, and taking the obtained output value as the probability of interconnection between the first node and the second node.
In one embodiment, after constructing the node characterization continuous distribution, the method further comprises:
based on the node representation continuous distribution, clustering a plurality of nodes to obtain class clusters to which the nodes belong respectively;
the step of determining a predicted loss based on the first loss comprises:
determining a fourth loss based on a difference between the first and second sample characterizations of homogeneous cluster nodes and a difference between the first and second sample characterizations of heterogeneous cluster nodes;
a predicted loss is determined based on the first loss and the fourth loss.
In one embodiment, the step of determining a predicted loss based on the first loss comprises:
based on the first loss, the predicted loss is determined using a contrast loss function.
In a second aspect, an embodiment provides a method for pushing a user by using a scoring model, where the scoring model is trained by using the method provided in the first aspect; the method comprises the following steps:
determining the similarity between the corresponding user and the pushing object based on the node representation continuous distribution of the user node and the node representation continuous distribution of the pushing object node through the scoring model;
Inputting the similarity as an input parameter into an activation function through the scoring model, and taking the obtained output value as a score of the user on the pushing object;
and selecting a plurality of pushing objects based on the scores of the users on the plurality of pushing objects, and pushing the users based on the selection results.
In a third aspect, an embodiment provides a training device of a scoring model, where the scoring model is used for predicting scores of a plurality of users on a plurality of to-be-pushed objects respectively; the scoring model is trained based on a relationship network comprising a plurality of nodes, wherein the plurality of nodes comprise nodes representing users and nodes representing push objects, and edges representing connection relations among the nodes; the parameters to be learned of the scoring model comprise node characterization; the device comprises:
an aggregation module configured to determine, via a graph neural network, an aggregate representation of a node based on the node representations of a plurality of nodes and an original adjacency matrix of the relationship network; wherein, the aggregation characterization aggregates neighbor user features and/or neighbor push object features;
the construction module is configured to construct node characterization continuous distribution based on the aggregation characterization and the random parameters of the nodes, and sample the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets; wherein the node characterization succession of distributions comprises: the node characterization continuous distribution comprising the user features and the node characterization continuous distribution comprising the push object features;
A penalty module configured to determine a first penalty based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, a predicted penalty based on the first penalty;
an updating module configured to update the node characterizations of the plurality of nodes in a direction that reduces the predictive loss.
In a fourth aspect, an embodiment provides a device for pushing a user by using a scoring model, where the scoring model is trained by using the method provided in the first aspect; the device comprises:
the determining module is configured to determine the similarity between the corresponding user and the pushing object based on the node representation continuous distribution of the user nodes and the node representation continuous distribution of the pushing object nodes through the scoring model;
the scoring module is configured to input the similarity as an input parameter into an activation function through the scoring model, and the obtained output value is used as the score of the user on the pushing object;
and the pushing module is configured to select a plurality of pushing objects based on scores of the users on the plurality of pushing objects and push the users based on selection results.
In a fifth aspect, embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any one of the first to second aspects.
In a sixth aspect, an embodiment provides a computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any one of the first to second aspects.
In the method and the device provided by the embodiment of the specification, a scoring model is trained by using a relational network. In the training process, node characterization continuous distribution is constructed based on aggregation characterization and random parameters of nodes, the node characterization continuous distribution is sampled to obtain a first sampling characterization and a second sampling characterization which are used as comparison targets, prediction loss is determined based on the constructed comparison targets, the comparison targets can be constructed by utilizing sparse connection relations in a relational network, unsupervised learning of users and pushing objects is realized, and therefore a scoring model can extract valuable features of deep layers of the users and the pushing objects, and pushing services can be provided for the users more accurately and reasonably when the scoring model is used for pushing.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a schematic illustration of an implementation scenario of an embodiment disclosed herein;
FIG. 2 is a flowchart of a training method of a scoring model according to an embodiment;
FIG. 3 is a flowchart of a method for pushing a user by using a scoring model according to an embodiment;
FIG. 4 is a schematic block diagram of a training device for scoring model provided in an embodiment;
fig. 5 is a schematic block diagram of an apparatus for pushing a user using a scoring model according to an embodiment.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. This figure 1 is also a schematic diagram of the training process of the scoring model. Wherein the relational network comprises a plurality of nodes and edges between the nodes. The plurality of nodes includes a user node representing a user and a push object node representing a push object. In the example of a relational network shown in fig. 1, circles represent user nodes and boxes represent push object nodes. The node representations of the plurality of nodes and the connection relation between the nodes are input into a graph neural network, and the aggregation representations of the plurality of nodes can be determined through the graph neural network. And constructing node characterization continuous distribution by utilizing the aggregation characterization and the random parameters, and sampling the node characterization continuous distribution to obtain a first sampling characterization and a second sampling characterization. Based on the first and second sample characterizations, a predictive loss can be obtained by contrast learning, and the node characterization is updated with the predictive loss. The model training is iteratively performed until convergence. The relationship network shown in fig. 1 includes a connection relationship between user nodes and a connection relationship between a user node and a push object node. The relationship network diagram is merely an example. In practical applications, the relationship network may only contain the connection relationship between the user node and the push object node.
The scoring model is used for predicting scores of a plurality of users on a plurality of pushing objects to be pushed respectively. The push object is an object waiting or ready to be pushed to a user, and can include content such as commodities, articles, videos or music. The scoring of the push object by the user shows the interest degree of the user on the push object. The service platform can provide push services for users in a server-client mode. The user side can view the pushed objects pushed by the service platform through the client. The user referred to in the embodiments of the present disclosure refers to a user device, or a user device that logs in a user account, that is, a user device where a client that logs in the user account is located.
The relational network includes a plurality of nodes and a plurality of edges. Nodes in the relational network include user nodes and push object nodes, and edges between the nodes. The edges between the nodes represent the connection relation between the nodes, including edges between the user nodes, edges between the user nodes and the pushing object, and the like. The connection relation reflects the association relation among different kinds of nodes. For example, the association between the user nodes may include a friendly relationship, a praise relationship, a borrowing relationship, and the like. The association between the user node and the push object node may include click, purchase, view, assignment, affiliation, and the like. For example, when the user a has click action on the commodity v, a connection edge may be formed between the user node corresponding to the user a and the commodity node corresponding to the commodity v. The data related to the user such as clicking, purchasing, watching and the like among the pushing objects are used after the authorization of the user is obtained, and the privacy of the data related to the user is protected from being revealed in the use process.
Node tokens may also be referred to as node features, which contain user features or push object features. The node representation of the user node comprises a user feature and the node representation of the push object comprises a push object feature. User characteristics may be understood as abstract characteristics extracted from user-related data. Push object features may be understood as abstract features that are extracted from push object features. One node representation may be represented by a vector and multiple node representations may be represented by a matrix of multiple vectors.
The users in the relational network are users of the service platform services, and the push object may be a push object provided by the service platform. The number of users and push objects of a service platform is typically very large. The relationship network comprises the association relationship between part of users and part of push objects. In order to predict the scores of more users on more push objects, a scoring model can be trained based on information such as nodes with association relations in a relation network, and scores between all users and all push objects in the relation network can be predicted by using the scoring model. The data such as node characterization is used as parameters to be learned in the scoring model, and the scoring model can learn the node characterization from the relational network through multiple rounds of iterative training.
To learn deeper node characterizations in order to predict more reasonable scores, one embodiment of the present specification provides a training method of a scoring model, comprising the steps of: step S210, determining the aggregate characterization of the nodes based on the node characterization of the plurality of nodes and the original adjacency matrix of the relational network through the graph neural network; step S220, constructing node characterization continuous distribution based on the aggregation characterization and the random parameters of the nodes, and sampling the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets; step S230, determining a first loss based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, determining a predicted loss based on the first loss; step S240, updating node characterization of the plurality of nodes in a direction of reducing the prediction loss.
In the embodiment, the scoring model can construct node representation continuous distribution, a comparison target is obtained through sampling, and the node representation is updated through comparison learning, so that valuable features of deep layers of users and push objects can be extracted, and push services can be provided for the users more accurately and reasonably by using the scoring model.
The present embodiment is described in detail below with reference to fig. 2.
Fig. 2 is a flowchart of a training method of a scoring model according to an embodiment. The method may be performed by a service platform. The service platform may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities. The scoring model is trained based on a relational network. The plurality of nodes comprise nodes representing users, nodes representing push objects and edges representing connection relations among the nodes. That is, the nodes include user nodes and push object nodes. When referring to a node, a user node and/or a push object node is included. The parameters to be learned of the scoring model include node characterization.
In one embodiment, the scoring model may include a graph neural network and a variation encoder. The graph neural network is used for determining the aggregate characterization of the nodes based on the node characterization of the plurality of nodes and the original adjacency matrix of the relational network. The variation encoder is used for constructing node characterization continuous distribution based on the aggregation characterization of the nodes and random parameters, and sampling the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets. In this case, the parameters to be learned of the scoring model may also include parameters to be learned in the neural network and/or parameters to be learned in the variable encoder. The graph neural network and the variational encoder may be collectively referred to as a variational graph inference network.
In one embodiment, the scoring model may include a variational encoder without including a graph neural network. The graph neural network may perform joint training with the scoring model.
The scoring model may be trained through several model iteration processes, any one of which may include steps S210-S240 shown in fig. 2.
In step S210, an aggregate representation μ of the nodes is determined by the graph neural network based on the node representations of the plurality of nodes and the original adjacency matrix a of the relationship network G. Wherein the aggregate characterization aggregates neighbor user features and/or neighbor push object features. The aggregate representation of the user nodes comprises user characteristics and the aggregate representation of the push object nodes comprises push object characteristics. The determined aggregate characterization may be understood as determining an aggregate characterization vector for each node or determining an aggregate characterization matrix for all nodes.
The following will exemplify a commodity as a pushing object. Let U denote the user set, and u= { U 1 ,…,u a ,…,u b ,…,u M },u a Represents the a-th user, u b Represents the b-th user, M represents the total number of users, 1 is less than or equal to a, and b is less than or equal to M. Let V denote the commodity set, and v= { V 1 ,…,v i ,…,v j ,…,v N },v i Represents the ith commodity, v j The j-th commodity is represented, N represents the total number of commodities, i is not less than 1, and j is not less than N. Let r ai Representing user u a a For the ith commodity v i Is the scoring value of (1), the scoring matrix r= { R of the commodity by the user ai } M×N . If the a-th user u a For the ith commodity v i With behavioural data, r ai =1, otherwise r ai =0. Wherein the behavior data includes viewing, purchasing, clicking, etc.
From the above raw data, an interaction graph G of the user to the commodity, that is, a relationship network G, can be constructed according to the following equation (1):
wherein A is the original adjacency matrix of the relation network G, and U V represents the node set of the relation network G.
Initially, the node characterization matrix E for each node in the relationship network G may be randomly determined. When m+n nodes are included in the relational network, E may be represented by a matrix in the dimension (m+n) ×d, i.e., e= { E 1 ,…,e a ,…,e M ,…e i ,…,e M+N }. Wherein e a The d-dimensional node characterization vector representing the a-th user node. The determined polymerization characterization μ can also be represented by a matrix of dimensions (m+n) ×d, i.e., μ= { μ i [ mu ] therein i Representing the aggregate token vector for the ith node. The ith node here may be a user node or a commodity node.
The node representation and the original adjacency matrix a are input into a graph neural network, which can output the aggregate representation μ of the nodes. The graph neural network may be implemented using a variety of network models, such as graph convolutional networks, graph round robin networks, or graph meaning networks. The implementation of the graph neural network is described below by taking the graph roll-up network as an example. The determination of the aggregate characterization of a node using a graph rolling network may involve a number of embodiments, the detailed description of which is presented below.
The graph convolution network includes L convolution layers. Each convolution layer may determine an intermediate representation of each node based on the output of the last convolution layer and the original adjacency matrix a. Based on the intermediate characterizations of the multiple convolutional layer outputs, an aggregate characterization of the node may be determined.
For any one convolutional layer l+1, the intermediate representation of any one node (e.g., the ith node) may be output in the following manner: based on the intermediate representation of the neighbor node of the ith node output by the previous convolution layer l, determining the intermediate representation of the ith node in the convolution layer l+1. Neighbor of the ith nodeThe living nodes may include nodes that have a connection relationship with the ith node. Specifically, a weighted average of the intermediate characterizations of the plurality of neighbor nodes may be used as the intermediate characterization of the ith node. For example, the intermediate representation μ of the ith node of the output of the ith+1th convolutional layer may be determined using equation (2) below i l+1 :
Wherein S is i The number of nodes in the node set having a connection relationship with the i-th node is represented. j is at S i Take the value of S j The number of nodes in the node set having a connection relationship with the jth node is represented. S is S i And S is j May be determined based on the original adjacency matrix a. Mu (mu) j l Is an intermediate representation of the ith node at the output of the first convolutional layer. The above formula (2) is only one embodiment, and modifications to the formula, such as removing the weight or modifying the weight, are possible.
For any one node, such as the ith node, an aggregate representation of the ith node is determined based on intermediate representations of the ith nodes output by the plurality of convolutional layers. The average value or weighted average value of the intermediate representation of the ith node output by the plurality of convolution layers can be determined as the aggregation representation of the ith node. For example, the aggregate characterization of the ith node output over the graph roll-up network may be obtained using the following equation (3):
wherein mu i Representing an aggregate representation of the ith node, the dimension of the vector is d-dimensional. The aggregate characterization matrix μ, which is composed of aggregate characterizations of all nodes, has a matrix dimension of (m+n) ×d dimension.
Through execution of step S210, the features of the neighbor nodes thereof are aggregated in the aggregation characterization of the user nodes based on the association relationship between users in the relational network and the association relationship between users and the pushing objects, wherein the features comprise the features of the neighbor users and/or the features of the neighbor pushing objects; the features of its neighbor nodes, including neighbor user features and/or neighbor push object features, are aggregated in an aggregated representation of push object nodes, such that the aggregated representation aggregates useful information in the relational network.
In step S220, a node token continuous distribution is constructed based on the aggregate token μ of the node and the random parameter e, and the node token continuous distribution is sampled by determining the value of the random parameter e, so as to obtain a first sampled token Z' and a second sampled token z″ as comparison targets.
Wherein the node characterization succession of distributions comprises: the nodes comprising user features characterize a continuous distribution and the nodes comprising push object features characterize a continuous distribution. The node characterization continuous distribution can be a vector of each node or a matrix formed by vectors of all nodes. The first sample representation Z' and the second sample representation z″ obtained by sampling also comprise user features and/or push object features.
Step S220 utilizes a re-parameterization technique to reconstruct the node representation continuous distribution. The re-parameterization can reconstruct the input aggregation characterization and generate meaningful data by utilizing the randomness of the random parameters, thereby providing more samples for comparison learning. The continuous distribution may be gaussian, or other distributions, such as uniform, may be used. The implementation of step S220 is described below by taking the node characterization continuous distribution as a gaussian distribution.
When the Gaussian distribution of the node characterization is constructed, the aggregate characterization can be taken as a mean value, and the corresponding variance is determined based on the mean value; then, based on the mean, variance and random parameters, a continuous distribution of node characterization conforming to the Gaussian distribution is constructed. Various embodiments may be included in constructing a node representation continuous distribution based on the mean, variance, and random parameters. For example, the variance may be used as a coefficient of the random parameter to construct a first term that characterizes the continuous distribution, and the node is constructed to characterize the continuous distribution based on a sum of the first term and the mean.
When the Gaussian distribution is constructed, the node representation of each node can be aimed at, and the node representation continuous distribution corresponding to the node can be constructed. In the following description, when the mean value, the variance and the random parameter are used for calculation, the mean value, the variance and the random parameter of a certain node are used, and the mean value, the variance and the random parameter correspond to each other. A pair of means and variances defines a gaussian distribution.
When determining the corresponding variance based on the mean, the determination may be made with reference to a conversion formula between the existing mean and variance. For example, the variance may be determined using the following equation (4):
σ=exp(μW+b) (4)
wherein W and b are parameters to be learned. exp is an exponential function based on a natural constant e. Mu is the mean and sigma is the variance. In view of the exp in the gaussian distribution density function, equation (4) is modified appropriately to eliminate exp in the calculation result, so that the conversion formula between the usual variance and the mean is modified appropriately. In equation (4), when μ is the mean vector of a certain node, σ is the variance vector of that node. When μ is the mean matrix composed of the mean vectors of all the nodes, σ is the variance matrix composed of the variance vectors of all the nodes.
In constructing a node characterization continuous distribution, it can be performed according to the following formula (5):
Z=μ+σ˙∈ (5)
wherein Z represents the continuous distribution of node characterization, sigma [ delta ] is the first term. The e is a random parameter. When mu is the mean vector of a certain node, sigma is the variance vector of the node, Z is the node characterization continuous distribution vector of the node. When mu is the mean matrix formed by the mean vectors of all the nodes, sigma is the variance matrix formed by the variance vectors of all the nodes, Z is the matrix formed by the node characterization continuous distribution vectors of all the nodes.
To make the calculation concise, the node representation continuous distribution can be assumed to be a gaussian distribution with a mean of 0 and a variance of 1. Correspondingly, the random parameter e can also be random gaussian noise with a mean value of 0 and a variance of 1.
When the value of the random parameter epsilon is determined, the random parameter epsilon can be generated according to a corresponding rule, and when two sampling characterizations need to be generated, the two random parameters epsilon can be obtained. For example, when two random parameters e 'and e″ are generated, the comparison targets Z' and z″ in the following formula (6) can be obtained based on the above formula (5):
Z′=μ+σ˙∈′,Z″=μ+σ˙∈″ (6)
where Z' may be a first sample representation and Z "a second sample representation.
In constructing the comparison target, two sample characterizations may be acquired, but are not limited to, and more sample characterizations may be acquired.
When the continuous distribution of node representation adopts gaussian distribution, the node representation may be randomly initialized using the gaussian distribution before step S210, so that the iterative process of the scoring model can be accelerated.
The above describes an embodiment employing a re-parameterized skill to reconstruct node representations of a continuous distribution. Some modifications to these embodiments may result in more embodiments. For example, the combination of the variance and the random parameter is not limited to the multiplication of the variance and the random parameter to obtain the first term, and may include a variety of ways.
In the constructed node characterization continuous distribution, the random parameter epsilon is randomly generated, the variance is used as a coefficient of the random parameter epsilon, so that the randomness is amplified, and the randomness of the node characterization continuous distribution is controlled. The randomness parameter e brings about randomness, while the variance determines the size of the randomness, so that the randomness is controllable. Also, as model iterations proceed, the variance is progressively learned, which makes the amplitude of the noise also learned. And the variances of different nodes are different, namely the variances of the node representations of different user nodes are different, the variances of the node representations of different pushing object nodes are different, and the variances carry the user characteristics and the pushing object characteristics. When the variances are different, the random parameters e in the corresponding node characterization continuous distribution are also different in size, and the random parameters e are influenced by the node variances and become related to the node, namely related to the user and the push object. The variance is large, the corresponding random noise is large, when the loss is calculated for the node, the difference value of the node is large, and the adjustment of the node representation of the node is correspondingly large. Therefore, the special properties of the user and the pushing object are considered, the generated node characterization continuous distribution also considers the special properties of the user and the pushing object, and the constructed comparison target is more efficient and more reasonable.
In step S230, a first loss is determined based on the difference between the first and second sampled representations of the same node and the difference between the first and second sampled representations of different nodes, and a predicted loss is determined based on the first loss. In step S240, node characterizations of the plurality of nodes are updated in a direction to reduce the predictive loss.
The first sampled representation Z' and the second sampled representation Z "of the same node should be as close as possible or as similar as possible. The first sampled representation Z' and the second sampled representation Z "of different nodes should be as far apart as possible or as dissimilar as possible. The first and second sample characterizations Z 'and Z "of the same node may be positive samples and the first and second sample characterizations Z' and Z" of different nodes may be negative samples. The nodes herein may be represented by node identification. Judging whether the two nodes are the same node may include judging whether the node identifiers are the same. For example, in the first sampling characterization matrix and the second sampling characterization matrix, two vectors corresponding to the ith node may be used as positive samples, and the vector of the ith node and the vector of the jth node may be used as negative samples, where i is different from j.
When the first loss calculated in step S230 is used, a contrast loss function may be used to determine the first loss, and the first loss may be used as part of the predicted loss. When determining the predicted loss based on the first loss, the model may be updated directly with the first loss as the predicted loss, or the first loss may be a part of the predicted loss, and the first loss and the loss determined by other means may be the predicted loss.
Step S230 calculates the prediction lossThe equation may be referred to as a node level contrast penalty. The nodes comprise user nodes and push object nodes, and the following formulas (7) and (8) can be adopted to respectively calculate the comparison learning loss function L of the user level N U Contrast loss function L with push object level N I :
Wherein τ 1 The comparison temperature is a preset super parameter. B (B) u Is a user in a batch (batch) node, B i Is a push object in a collection of nodes. T is the matrix transpose symbol. z a ' first sample characterization vector for the a-th user, z a "characterize the vector for the second sample of the a-th user. z i ' first sample characterization vector for ith push object, z i "characterize the vector for the second sample of the ith push object.
When determining the predicted loss, a second loss may also be determined based on the differences between the mean and the variance and the mean and the variance of the preset gaussian distribution, respectively, and the predicted loss may be determined based on the first loss and the second loss. When the predicted loss is determined based on the first loss and the second loss, the sum of the first loss and the second loss may be used as the predicted loss, or the first loss and the second loss may be used as the predicted loss together with the loss determined in other ways.
Other ways of determining the prediction loss are described further below. After the node representation continuous distribution Z is constructed, the probability of interconnection among a plurality of nodes can be predicted based on the node representation continuous distribution Z; based on the probability, a reconstructed adjacency matrix A' of the relational network is determined.
In determining the predicted loss, a third loss may be determined based on a difference between the reconstructed adjacency matrix a' and the original adjacency matrix a, and the predicted loss may be determined based on the first loss, the third loss, and the like.
When predicting the probability of interconnection between a plurality of nodes, aiming at any first node and second node, the similarity between the node representation continuous distribution of the first node and the node representation continuous distribution of the second node can be used as the input parameter of the activation function, and the obtained output value is used as the probability of interconnection between the first node and the second node.
When predicting the probability of interconnection among a plurality of nodes, sampling the node representation continuous distribution by determining the value of the random parameter epsilon to obtain a third sampling representation Z 3 . Characterization of Z based on third samples of multiple nodes 3 The probability of interconnections between a plurality of nodes is predicted. Third sample characterization Z 3 Either of the first usage characterization Z' and the second sampling characterization Z″ may also be employed, or a new third sampling characterization Z may be derived by randomly generating ε 3 。
For any first node and second node, the similarity between the third sampling characterization vector of the first node and the third sampling characterization vector of the second node can be used as an input parameter of an activation function, and the obtained output value is used as the probability of interconnection between the first node and the second node.
For example, the probability p that any i-th node and j-th node are connected may be determined using the following equation (9):
wherein z is i A third sample characterization vector, z, which is the ith node j Is the third sample characterization vector of the j-th node, sigmoid is the activation function. For the third sample characterization matrix, z i And z j The i-th vector and the j-th vector in the third sample characterization matrix, respectively. p has a value between 0 and 1.
In one embodiment, the determination based on the reconstructed adjacency matrix A' and the original adjacency moment can be determined using the following equation (10)Differences between arrays a and reconstructed loss function L based on the differences between the mean and variance and the mean and variance of the preset gaussian distribution, respectively ELBO :
Where δ is the activation function. D (D) a Is training data for user a. For example, for user a, if user a has a connection relationship with 1, 2, and 3 of five products 1 to 5, user a forms positive samples with products 1, 2, and 3, respectively, and user a forms negative samples with products 4 and 5, respectively. The first term to the right of equation (10) is the loss function between adjacent matrices and the second term is the loss function between gaussian distributions.
In order to make the determined prediction loss more comprehensive, thereby improving the iteration speed of the model, after the continuous distribution of node characterization is constructed, the nodes can be clustered, and the basic contrast loss function of the clustering can be determined.
Specifically, based on the node characterization continuous distribution Z, a plurality of nodes can be clustered to obtain class clusters to which the nodes respectively belong.
In determining the predicted loss, a fourth loss may be determined based on a difference between the first sample representation Z 'and the second sample representation Z "of the homogeneous cluster node, and a difference between the first sample representation Z' and the second sample representation Z" of the heterogeneous cluster node, the predicted loss being determined based on the first loss and the fourth loss, and the like.
When the nodes are clustered by using the node representation continuous distribution Z, any value in the node representation continuous distribution Z can be used, and the value can be a first sampling representation Z' or a second sampling representation Z″ or a resampled value.
In clustering, a plurality of existing clustering algorithms, such as K-Means algorithm or mean shift algorithm, can be adopted. The clustering process for the nodes is described below by taking the K-Means algorithm as an example. In the clustering, the user node and the pushing object node can be clustered respectively.
Initially, the number of user node clusters may be defined as K u The clustering number of the push objects is K i . Determining a clustering prototype of a user node as C using a K-Means algorithm u ={c k u } Ku Push object clustering prototype is C i ={c k i } Ki Thereby obtaining the clustering distribution pi of the user nodes a=1 M ∏ k=1 Ku p(c k u |z a ) And pushing object node cluster distribution pi i=M+1 M+N ∏ k=1 Ki p(c k i |z i ). The cluster prototypes are the cluster centers, and one cluster prototype corresponds to one class cluster. The cluster distribution comprises user node cluster distribution and push object node cluster distribution. The cluster distribution may be represented in a vector or matrix form. In one embodiment, the dimension of the cluster distribution vector is the total number of categories of the cluster, which category the node belongs to, and the element in the corresponding vector can be given a value of 1, and the other elements are given a value of 0. The cluster distribution matrix is a matrix formed by cluster distribution vectors of all nodes. For example, a row in the matrix may be a cluster distribution vector of a user or a push object, listed as a dimension of the cluster distribution vector.
In one embodiment, after determining the cluster distribution of the nodes, the probability that any two nodes belong to the same cluster may be determined based on the cluster distribution of the plurality of nodes, and the contrast loss function of the cluster level may be determined based on the probability.
For example, the following equation (11) may be used to determine the probability that the a-th user and the b-th user belong to the same cluster prototype, that is, the probability p (a, b) of belonging to the same cluster:
wherein z is a Is the vector of the a-th user in the node representation continuous distribution, z b Is the vector that the node characterizes the b-th user in the continuous distribution.
Similarly, the following equation (12) may be used to determine the probability that the ith push object and the jth push object belong to the same cluster prototype, that is, the probability p (i, j) that belong to the same cluster:
wherein z is i Is the vector of the ith push object in the node representation continuous distribution, z j Is the vector that the node characterizes the j-th push object in the continuous distribution.
The above formulas (11) and (12) for calculating the probability are only one example. New embodiments may be obtained by appropriate modification of these formulas, such as adding coefficients or weights.
The same kind of cluster nodes comprise different user nodes belonging to the same kind of clusters and different push object nodes belonging to the same kind of clusters. The heterogeneous cluster nodes comprise different user nodes belonging to different kinds of clusters and different push object nodes belonging to different kinds of clusters. In the first sample representation Z' and the second sample representation Z ", for different users a and b belonging to the same class of clusters, the first sample representation vector of user a and the second sample representation vector of user b belong to positive samples. For different users a and b of different class clusters, the first sample characterization vector for user a and the second sample characterization vector for user b belong to negative samples. For different push objects i and j belonging to the same class of clusters, the first sample characterization vector of push object i and the second sample characterization vector of push object j belong to positive samples. For different push objects i and j belonging to different class clusters, the first sample characterization vector of push object i and the second sample characterization vector of push object j belong to negative samples. When determining the fourth loss based on the difference between the first sample representation Z 'and the second sample representation Z' of the homogeneous cluster node, and the difference between the first sample representation Z 'and the second sample representation Z' of the heterogeneous cluster node, a contrast loss function may be employed for the determination.
For example, the contrast loss function L of the user cluster level may be determined using the following equation (13) C U Determining a contrast loss function L of the clustering level of the push object by adopting a formula (14) C I :
Wherein τ 2 The comparison temperature is a preset super parameter. B (B) u Is a user in a group of nodes, B i Is a push object in a collection of nodes. z a ' first sample characterization vector for the a-th user, z b "characterize the vector for the second sample of the b-th user. z i ' first sample characterization vector for ith push object, z j "characterize the vector for the second sample of the jth push object. SP (a) is the sum of the number of positive samples of the molecular moiety in formula (13), SP (i) is the sum of the number of positive samples of the molecular moiety in formula (14), and the reciprocal of these two parameters is added so that formulas (13) and (14) are at the same node level as formulas (7) and (8). Formulas (13) and (14) are only one embodiment, and modifications thereof may be made to obtain different embodiments.
In one embodiment, based on the sum of the first, second, and third losses, a total predicted loss may be determined, and the node characterization wait learning parameter updated based on the total predicted loss. For example, the total contrast loss function L can be determined based on the formulas (7) and (8) and the formulas (13) and (14), and using the following formula (15) cl :
L cl =(L N U +L N I )+γ(L C U +L C I ) (15)
Wherein, gamma is a preset weight coefficient, and can be set empirically.
Based on equation (15) and equation (10), a multitasking optimization objective may be established using the following equation (16), namely determining the total loss function L (θ):
L(θ)= L ELBO +αL cl +β‖E‖ 2 (16)
wherein E is node representation, and alpha and beta are respectively used for regulating and controlling the weight of the contrast loss function and the regularization term. II E II 2 To find the L2 norm for E. θ= [ E, W, b]Is the parameter to be learned. The optimization target can be solved by a gradient descent method, and the parameter theta is updated.
The steps S210 to S240 are an iterative process of the scoring model. A batch of nodes in the relational network may be taken for training in one iteration. And when the prediction loss of the scoring model is smaller than the threshold value or the iteration number reaches the threshold value, the model converges.
The above is the training process of the scoring model. In the above embodiment, the node characterization is reconstructed by using the aggregation characterization and the random parameter, and the reconstructed parameter is sampled, so that a comparison target is constructed for comparison learning, thereby learning the node characterization from the relational network. When the node representation continuous distribution is constructed, the variance is used as a coefficient of a random parameter, so that the individuation of the nodes in the relation network can be fully considered in the parameter construction process, a comparison target with lossless semantics and individuation of the nodes is constructed, and the pushing effect is effectively improved.
According to the embodiment, the historical interaction data of the user are deeply mined through the scoring model, the high-order potential preference of the user and the pushing object is modeled by using the graph neural network, and the node representation continuous distribution is accurately inferred. In contrast learning based on cluster perception, similarity among nodes is calculated in an unsupervised mode, the contrast learning is further improved from the aspect of cluster distribution, and the problem of sparse interaction data in a relational network is effectively solved.
Fig. 3 is a flowchart of a method for pushing a user by using a scoring model according to an embodiment. The scoring model is trained using the method provided by the embodiment shown in fig. 2. The method comprises the following steps.
Step S310, determining the similarity between the corresponding user and the pushing object based on the node representation continuous distribution of the user nodes and the node representation continuous distribution of the pushing object nodes through a scoring model.
Step S320, the similarity is used as an input parameter to input an activation function through a scoring model, and the obtained output value is used as a score of a user on a pushing object.
For example, the scoring value r of the ith user to the ith push object can be predicted using the following equation (17) ai ′:
r ai ′=sigmoid(z a T z i ) (17)
Wherein z is a T z i Sigmoid is an activation function for the similarity between the a-th push object and the i-th push object. From the above equation, a predictive scoring matrix R' = { R can be obtained ai ′} M×N 。
Step S330, selecting a plurality of pushing objects based on scores of the users on the plurality of pushing objects, and pushing the users based on selection results. When selecting, the push objects with the highest scores can be pushed to the user.
In this embodiment, when determining the similarity between the user and the push object, the node-characterization continuous distribution is used as the node vector participation score prediction. The node representation continuous distribution aggregates more neighbor node information relative to the node representation, and contains deeper user characteristics and/or pushing object characteristics, so that the scoring is more accurate and the pushing is more reasonable.
In other embodiments, nodes trained by the scoring model can also be directly used for representing the participation in scoring prediction.
In this specification, the terms "first" in the first sample characterization, first node, etc., and the corresponding "second" (if any) herein, are for convenience of distinction and description only and are not in any limiting sense.
The foregoing describes certain embodiments of the present disclosure, other embodiments being within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Furthermore, the processes depicted in the accompanying figures are not necessarily required to achieve the desired result in the particular order shown, or in a sequential order. In some embodiments, multitasking and parallel processing are also possible, or may be advantageous.
Fig. 4 is a schematic block diagram of a training apparatus for scoring model according to an embodiment. The scoring model is used for predicting scores of a plurality of users on a plurality of pushing objects to be pushed respectively; the scoring model is trained based on a relationship network comprising a plurality of nodes, wherein the plurality of nodes comprise nodes representing users and nodes representing push objects, and edges representing connection relations among the nodes; the parameters to be learned of the scoring model include node characterization. The apparatus may be deployed in a service platform, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities. This embodiment of the device corresponds to the embodiment of the method shown in fig. 2. The apparatus 400 includes:
an aggregation module 410 configured to determine, via a graph neural network, an aggregate representation of a node based on the node representations of a plurality of nodes and an original adjacency matrix of the relationship network; wherein, the aggregation characterization aggregates neighbor user features and/or neighbor push object features;
the construction module 420 is configured to construct a node representation continuous distribution based on the aggregation representation and the random parameter of the node, and sample the node representation continuous distribution by determining the value of the random parameter to obtain a first sampling representation and a second sampling representation as comparison targets; wherein the node characterization succession of distributions comprises: the node characterization continuous distribution comprising the user features and the node characterization continuous distribution comprising the push object features;
A penalty module 430 configured to determine a first penalty based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, a predicted penalty based on the first penalty;
an updating module 440 configured to update the node representations of the plurality of nodes in a direction that reduces the predictive loss.
In one embodiment, the graph neural network is implemented using a graph roll-up network; the graph convolution network comprises a plurality of convolution layers; the aggregation module 410 includes a first determination submodule and a second determination submodule; (not shown in the drawings)
A first determination submodule configured to output an intermediate representation of any one node by any one of the convolutional layers in the following manner: determining the middle representation of the node in the convolution layer based on the middle representation of the neighbor node of the node output by the previous convolution layer;
and the second determining submodule is configured to determine the aggregate representation of any node based on the intermediate representation of the node output by a plurality of convolution layers.
In one embodiment, the node representation continuous distribution conforms to a gaussian distribution; the build module 420 includes a third determination sub-module and a first build sub-module; (not shown in the drawings)
A third determination submodule configured to determine a corresponding variance based on the mean with the aggregate characterization as the mean;
a first construction sub-module configured to construct a node representation continuous distribution conforming to a gaussian distribution based on the mean, the variance, and the random parameter.
In one embodiment, the first building sub-module is specifically configured to:
constructing a first term of node characterization continuous distribution by taking the variance as a coefficient of the random parameter;
and constructing the node characterization continuous distribution based on the sum of the first term and the mean value.
In one embodiment, the random parameter is random gaussian noise with a mean of 0 and a variance of 1.
In one embodiment, the loss module 430, when determining a predicted loss based on the first loss, comprises:
determining a second loss based on differences between the mean and the variance and a preset mean and variance of a gaussian distribution, respectively;
a predicted loss is determined based on the first loss and the second loss.
In one embodiment, the apparatus 400 further comprises:
a reconstruction module (not shown in the figure) configured to predict a probability of interconnection between the plurality of nodes based on the node characterization continuous distribution after constructing the node characterization continuous distribution; determining a reconstructed adjacency matrix of the relational network based on the probabilities;
The loss module 430, when determining a predicted loss based on the first loss, includes:
determining a third loss based on a difference between the reconstructed adjacency matrix and the original adjacency matrix;
a predicted loss is determined based on the first loss and the third loss.
In one embodiment, the reconstruction module, when predicting a probability of interconnection between a plurality of nodes, comprises:
and aiming at any first node and second node, taking the similarity between the node representation continuous distribution of the first node and the node representation continuous distribution of the second node as an input parameter of an activation function, and taking the obtained output value as the probability of interconnection between the first node and the second node.
In one embodiment, the apparatus 400 further comprises:
the clustering module is configured to cluster the plurality of nodes based on the node representation continuous distribution after the node representation continuous distribution is constructed, so as to obtain class clusters to which the plurality of nodes belong respectively;
the loss module 430, when determining a predicted loss based on the first loss, includes:
determining a fourth loss based on a difference between the first and second sample characterizations of homogeneous cluster nodes and a difference between the first and second sample characterizations of heterogeneous cluster nodes;
A predicted loss is determined based on the first loss and the fourth loss.
In one embodiment, the loss module 430, when determining a predicted loss based on the first loss, includes:
based on the first loss, the predicted loss is determined using a contrast loss function.
Fig. 5 is a schematic block diagram of an apparatus for pushing a user using a scoring model according to an embodiment. The scoring model is trained using the method provided by the embodiment of fig. 2. The apparatus 500 may be deployed in a service platform, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities. This embodiment of the device corresponds to the embodiment of the method shown in fig. 3. The apparatus 500 includes:
a determining module 510 configured to determine, through the scoring model, a similarity between the corresponding user and the push object based on the node representation continuous distribution of the user nodes and the node representation continuous distribution of the push object nodes;
the scoring module 520 is configured to input the similarity as an input parameter into an activation function through the scoring model, and use the obtained output value as a score of the user on the push object;
The pushing module 530 is configured to select a plurality of pushing objects based on the scores of the users on the plurality of pushing objects, and push the users based on the selection results.
The foregoing apparatus embodiments correspond to the method embodiments, and specific descriptions may be referred to descriptions of method embodiment portions, which are not repeated herein. The device embodiments are obtained based on corresponding method embodiments, and have the same technical effects as the corresponding method embodiments, and specific description can be found in the corresponding method embodiments.
The present description also provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of figures 1 to 3.
Embodiments of the present disclosure also provide a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any one of fig. 1 to 3.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for storage media and computing device embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The foregoing detailed description of the embodiments of the present invention further details the objects, technical solutions and advantageous effects of the embodiments of the present invention. It should be understood that the foregoing description is only specific to the embodiments of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.
Claims (15)
1. The scoring model is used for predicting scores of a plurality of users on a plurality of pushing objects to be pushed respectively; the scoring model is trained based on a relationship network comprising a plurality of nodes, wherein the plurality of nodes comprise nodes representing users and nodes representing push objects, and edges representing connection relations among the nodes; the parameters to be learned of the scoring model comprise node characterization; the method comprises the following steps:
Determining, by the graph neural network, an aggregate representation of the nodes based on the node representations of the plurality of nodes and an original adjacency matrix of the relationship network; wherein, the aggregation characterization aggregates neighbor user features and/or neighbor push object features;
based on the aggregation characterization and random parameters of the nodes, constructing node characterization continuous distribution, and sampling the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets; wherein the node characterization succession of distributions comprises: the node characterization continuous distribution comprising the user features and the node characterization continuous distribution comprising the push object features;
determining a first loss based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, determining a predicted loss based on the first loss;
updating the node representations of the plurality of nodes in a direction to reduce the predictive loss.
2. The method of claim 1, the graph neural network implemented with a graph roll-up network; the graph convolution network comprises a plurality of convolution layers; the step of determining an aggregate representation of the node comprises:
Any one convolution layer outputs an intermediate representation of any one node in the following manner: determining the middle representation of the node in the convolution layer based on the middle representation of the neighbor node of the node output by the previous convolution layer;
for any one node, determining the aggregate characterization of the node based on the intermediate characterization of the node output by a plurality of convolution layers.
3. The method of claim 1, the node characterization continuous distribution conforming to a gaussian distribution;
the step of constructing the node representation continuous distribution comprises the following steps:
characterizing the aggregation as a mean;
determining a corresponding variance based on the mean;
and constructing a node characterization continuous distribution conforming to Gaussian distribution based on the mean value, the variance and the random parameter.
4. A method according to claim 3, said step of constructing a gaussian-compliant node characterization continuous distribution comprising:
constructing a first term of node characterization continuous distribution by taking the variance as a coefficient of the random parameter;
and constructing the node characterization continuous distribution based on the sum of the first term and the mean value.
5. A method according to claim 3, wherein the random parameter is random gaussian noise with a mean of 0 and a variance of 1.
6. A method according to claim 3, the step of determining a predicted loss based on the first loss comprising:
determining a second loss based on differences between the mean and the variance and a preset mean and variance of a gaussian distribution, respectively;
a predicted loss is determined based on the first loss and the second loss.
7. The method of claim 1, further comprising, after constructing the node characterization continuous distribution:
predicting the probability of interconnection between a plurality of nodes based on the node characterization continuous distribution;
determining a reconstructed adjacency matrix of the relational network based on the probabilities;
the step of determining a predicted loss based on the first loss comprises:
determining a third loss based on a difference between the reconstructed adjacency matrix and the original adjacency matrix;
a predicted loss is determined based on the first loss and the third loss.
8. The method of claim 7, the step of predicting a probability of interconnection between a plurality of nodes, comprising:
and aiming at any first node and second node, taking the similarity between the node representation continuous distribution of the first node and the node representation continuous distribution of the second node as an input parameter of an activation function, and taking the obtained output value as the probability of interconnection between the first node and the second node.
9. The method of claim 1, further comprising, after constructing the node characterization continuous distribution:
based on the node representation continuous distribution, clustering a plurality of nodes to obtain class clusters to which the nodes belong respectively;
the step of determining a predicted loss based on the first loss comprises:
determining a fourth loss based on a difference between the first and second sample characterizations of homogeneous cluster nodes and a difference between the first and second sample characterizations of heterogeneous cluster nodes;
a predicted loss is determined based on the first loss and the fourth loss.
10. The method of claim 1, the step of determining a predicted loss based on the first loss comprising:
based on the first loss, the predicted loss is determined using a contrast loss function.
11. A push method for pushing a user by using a scoring model, wherein the scoring model is trained by the method of claim 1; the method comprises the following steps:
determining the similarity between the corresponding user and the pushing object based on the node representation continuous distribution of the user node and the node representation continuous distribution of the pushing object node through the scoring model;
Inputting the similarity as an input parameter into an activation function through the scoring model, and taking the obtained output value as a score of the user on the pushing object;
and selecting a plurality of pushing objects based on the scores of the users on the plurality of pushing objects, and pushing the users based on the selection results.
12. The scoring model is used for predicting scores of a plurality of users on a plurality of pushing objects to be pushed respectively; the scoring model is trained based on a relationship network comprising a plurality of nodes, wherein the plurality of nodes comprise nodes representing users and nodes representing push objects, and edges representing connection relations among the nodes; the parameters to be learned of the scoring model comprise node characterization; the device comprises:
an aggregation module configured to determine, via a graph neural network, an aggregate representation of a node based on the node representations of a plurality of nodes and an original adjacency matrix of the relationship network; wherein, the aggregation characterization aggregates neighbor user features and/or neighbor push object features;
the construction module is configured to construct node characterization continuous distribution based on the aggregation characterization and the random parameters of the nodes, and sample the node characterization continuous distribution by determining the value of the random parameters to obtain a first sampling characterization and a second sampling characterization which are taken as comparison targets; wherein the node characterization succession of distributions comprises: the node characterization continuous distribution comprising the user features and the node characterization continuous distribution comprising the push object features;
A penalty module configured to determine a first penalty based on a difference between the first and second sampled representations of the same node and a difference between the first and second sampled representations of different nodes, a predicted penalty based on the first penalty;
an updating module configured to update the node characterizations of the plurality of nodes in a direction that reduces the predictive loss.
13. A device for pushing a user by using a scoring model, wherein the scoring model is trained by the method of claim 1; the device comprises:
the determining module is configured to determine the similarity between the corresponding user and the pushing object based on the node representation continuous distribution of the user nodes and the node representation continuous distribution of the pushing object nodes through the scoring model;
the scoring module is configured to input the similarity as an input parameter into an activation function through the scoring model, and the obtained output value is used as the score of the user on the pushing object;
and the pushing module is configured to select a plurality of pushing objects based on scores of the users on the plurality of pushing objects and push the users based on selection results.
14. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-11.
15. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310572391.3A CN116611731A (en) | 2023-05-18 | 2023-05-18 | Scoring model training method, user pushing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310572391.3A CN116611731A (en) | 2023-05-18 | 2023-05-18 | Scoring model training method, user pushing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116611731A true CN116611731A (en) | 2023-08-18 |
Family
ID=87684847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310572391.3A Pending CN116611731A (en) | 2023-05-18 | 2023-05-18 | Scoring model training method, user pushing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116611731A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117217792A (en) * | 2023-09-06 | 2023-12-12 | 五凌电力湖南能源销售有限公司 | Power value-added service product matching decision method based on data processing |
-
2023
- 2023-05-18 CN CN202310572391.3A patent/CN116611731A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117217792A (en) * | 2023-09-06 | 2023-12-12 | 五凌电力湖南能源销售有限公司 | Power value-added service product matching decision method based on data processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109299396B (en) | Convolutional neural network collaborative filtering recommendation method and system fusing attention model | |
Neysiani et al. | Improve performance of association rule-based collaborative filtering recommendation systems using genetic algorithm | |
Yang et al. | Top-n recommendation with counterfactual user preference simulation | |
Li et al. | Deep probabilistic matrix factorization framework for online collaborative filtering | |
CN107506590A (en) | A kind of angiocardiopathy forecast model based on improvement depth belief network | |
CN110955826B (en) | Recommendation system based on improved cyclic neural network unit | |
Kaji et al. | An adversarial approach to structural estimation | |
CN109840833B (en) | Bayesian collaborative filtering recommendation method | |
CN108921342B (en) | Logistics customer loss prediction method, medium and system | |
Navgaran et al. | Evolutionary based matrix factorization method for collaborative filtering systems | |
CN110781409A (en) | Article recommendation method based on collaborative filtering | |
Sridhar et al. | Content-Based Movie Recommendation System Using MBO with DBN. | |
Jin et al. | Neighborhood-aware web service quality prediction using deep learning | |
CN116611731A (en) | Scoring model training method, user pushing method and device | |
Nápoles et al. | Recommender system using long-term cognitive networks | |
CN113641907B (en) | Super-parameter self-adaptive depth recommendation method and device based on evolutionary algorithm | |
Chiu et al. | An evolutionary approach to compact dag neural network optimization | |
CN113449182B (en) | Knowledge information personalized recommendation method and system | |
Chen et al. | Poverty/investment slow distribution effect analysis based on Hopfield neural network | |
CN106897388A (en) | Predict the method and device of microblogging event temperature | |
CN113836393A (en) | Cold start recommendation method based on preference adaptive meta-learning | |
Liang et al. | A normalizing flow-based co-embedding model for attributed networks | |
CN117056609A (en) | Session recommendation method based on multi-layer aggregation enhanced contrast learning | |
Wu et al. | A unified generative adversarial learning framework for improvement of skip-gram network representation learning methods | |
Zhang et al. | Tuning extreme learning machine by an improved electromagnetism-like mechanism algorithm for classification problem |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |