CN112464647A - Recommendation system-oriented negative sampling method and device and electronic equipment - Google Patents

Recommendation system-oriented negative sampling method and device and electronic equipment Download PDF

Info

Publication number
CN112464647A
CN112464647A CN202011320456.8A CN202011320456A CN112464647A CN 112464647 A CN112464647 A CN 112464647A CN 202011320456 A CN202011320456 A CN 202011320456A CN 112464647 A CN112464647 A CN 112464647A
Authority
CN
China
Prior art keywords
negative
sample
sampling
negative sample
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011320456.8A
Other languages
Chinese (zh)
Other versions
CN112464647B (en
Inventor
杨珍
丁铭
邵洲
刘德兵
张鹏
唐杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhipu Huazhang Technology Co ltd
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202011320456.8A priority Critical patent/CN112464647B/en
Publication of CN112464647A publication Critical patent/CN112464647A/en
Application granted granted Critical
Publication of CN112464647B publication Critical patent/CN112464647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention discloses a recommendation system-oriented negative sampling method and device and electronic equipment. The method comprises the following steps: for each positive sample pair, starting from a node representing a user in the positive sample pair, sampling a first number of negative samples on a traversal path of a bipartite graph of a recommendation system, and generating a negative sample traversal set; acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method to generate a negative sample candidate set; calculating the sampling weight of each negative sample in the negative sample candidate set and the similarity between each negative sample in the negative sample candidate set and the positive sample in the positive sample pair; and carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples, and constructing a corresponding negative sample pair set. Practical application shows that compared with a comparison method, the recommendation performance is greatly improved by adopting the scheme of the invention.

Description

Recommendation system-oriented negative sampling method and device and electronic equipment
Technical Field
The invention relates to the technical field of sampling, in particular to a recommendation system-oriented negative sampling method and device and electronic equipment.
Background
At present, recommendation systems are applied in many scenarios, such as e-commerce commodity recommendation, learner recommendation, advertisement recommendation, web page recommendation, and the like. The large-scale recommendation system has a large number of samples, and each positive and negative sample pair cannot be operated. The negative sampling technology is an important link in many recommendation methods, and can be used for accelerating model training and reducing the computational complexity, so that the recommendation methods can be applied to an ultra-large-scale recommendation system. It is therefore very necessary to apply negative sampling techniques in the recommendation system. However, which samples are collected as negative samples has a significant impact on the quality of the vector representation and thus on the effectiveness of downstream tasks.
The graph is a data structure composed of nodes and edges, and is widely applied to human production and life. The recommendation system can be represented as a bipartite graph G ═ U, V, E, where U represents a set of nodes of one type, such as E-commerce users, scholars, short video audiences, etc., V represents a set of nodes of another type, such as goods, academic articles, short videos, etc., and E represents a connecting edge between the two types of nodes, indicating whether there is an interactive relationship between the two.
Negative sampling plays an indispensable role as a key technique in graph representation learning. The negative samples are distributed according to a certain probability PnSampling K samples to replace all N training samples to calculate the softmax, so that the softmax function with the calculation complexity of O (N) is converted into the sigmoid function with the calculation complexity of O (K). Thus, training inDuring training, all weights do not need to be updated for each training sample, only a small part of weights need to be updated, and the method has a positive effect on accelerating the training speed of large-scale graph representation learning. However, the selection of negative examples also affects the performance of graph representation learning.
Currently, given a positive sample pair in negative sampling, there are several methods for selecting the negative sample pair: word2Vec sets the negative sample distribution to 3/4 for the Word frequency, i.e. there is a greater preference for selecting high frequency words in the negative sample, and a lesser probability for low frequency words. BPR proposes a uniform random negative sampling strategy that is static, global, and does not take into account the personalization of each node. The DNS dynamically selects negative examples according to the current model, which is selected by selecting the current model to predict the highest score. The IRGAN uses the GAN to adaptively select the negative samples corresponding to the first information in the positive sample pair. However, these strategies are heuristic and only serve to select negative examples of "hard", thereby affecting the recommendation performance of the recommendation system.
Disclosure of Invention
The invention provides a negative sampling method for a recommendation system, which comprises the following steps:
for each positive sample pair, starting from a node representing a user in the positive sample pair, sampling a first number of negative samples on a traversal path of a bipartite graph of a recommendation system, and generating a negative sample traversal set;
acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method to generate a negative sample candidate set;
calculating the sampling weight of each negative sample in the negative sample candidate set and the similarity between each negative sample in the negative sample candidate set and the positive sample in the positive sample pair;
and carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples, and constructing a corresponding negative sample pair set.
Preferably, the traversal path of the bipartite graph of the recommendation system is generated as follows:
and for each step of traversal, adopting depth-first search (DFS) traversal according to the probability of omega, and adopting breadth-first search (BFS) traversal according to the probability of 1-omega.
Preferably, positive samples are samples that are exposed and clicked on by the user, and negative samples are samples that are exposed but not clicked on by the user or unexposed samples.
Preferably, the obtaining a second number of negative examples from the negative example traversal set based on a self-contrast approximation method includes:
calculating each negative sample v in the negative sample traversal set by using the following formulanProbability of (2)
Figure BDA0002792718430000031
Where E is the given encoder for learning the vector representation of the node, θ is the parameter to be learned by the encoder, u is the node representing the user, vjTraversing the negative examples in the set for the negative example, C(M)For negative sample traversal sets, 0<α<1;
And selecting a second number of negative samples according to the probability.
Preferably, the calculating the sampling weight of each negative sample in the negative sample candidate set comprises:
calculating the number of samples exposed but not clicked by the user in the negative sample candidate set, and taking the value of the number as the sampling weight of the samples exposed but not clicked by the user;
and setting the sampling weight of the unexposed sample in the negative sample candidate set as a preset value.
Preferably, each negative sample v in the candidate set of negative samples is calculated as followsnSimilarity to positive samples:
Figure BDA0002792718430000041
wherein q (v)n| v) is the similarity of each negative sample in the negative sample candidate set to the positive sample, v is the positive sample in the positive sample pair, vjFor negative examples in the negative example candidate set, C(L)For a negative sample candidate set, σ represents a sigmoid function
Figure BDA0002792718430000042
Preferably, the negatively sampling the negative sample candidate set according to the sampling weight and the similarity includes:
for each negative sample v in the negative sample candidate setnAccording to β (v)n)·p(vn|u)·q(vnSelecting a third number of negative samples from the calculation result of | v);
wherein, beta (v)n) A sampling weight, p (v), for each negative sample in the candidate set of negative samplesn| u) is the probability of each negative example in the negative example candidate set, q (v)n| v) is the similarity of each negative sample in the candidate set of negative samples to a positive sample.
The invention provides a recommendation system-oriented negative sampling device in a second aspect, which comprises:
the negative sample traversal set generation module is used for sampling a first number of negative samples on a traversal path of a bipartite graph of the recommendation system from a node representing a user in each positive sample pair to generate a negative sample traversal set;
the negative sample candidate set generation module is used for acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method to generate a negative sample candidate set;
a calculating module, configured to calculate a sampling weight of each negative sample in the negative sample candidate set, and a similarity between each negative sample in the negative sample candidate set and a positive sample in the positive sample pair;
and the negative sample acquisition module is used for carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples and construct a corresponding negative sample pair set.
A third aspect of the invention provides a memory storing a plurality of instructions for implementing the method described above.
The invention also provides an electronic device comprising a processor and a memory connected with the processor, wherein the memory stores a plurality of instructions which can be loaded and executed by the processor so as to enable the processor to execute the method.
The invention has the beneficial effects that: the negative sampling method, the negative sampling device and the electronic equipment for the recommendation system, provided by the invention, are characterized in that a negative sample traversal set is generated by sampling a first number of negative samples on a traversal path of a bipartite graph of the recommendation system; then based on a self-contrast approximation method, acquiring a second number of negative samples from the negative sample traversal set to generate a negative sample candidate set; and finally, carrying out negative sampling on the negative sample candidate set according to the sampling weight of each negative sample in the negative sample candidate set and the similarity between the negative sample and the positive sample in the positive sample pair to obtain a third number of negative samples corresponding to the positive sample. Practical application shows that the method can utilize exposure information to mine negative samples of hard and real, and the recommendation performance is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a recommendation system-oriented negative sampling method according to the present invention;
FIG. 2 is an example of a bipartite graph of user-commodity interaction according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a negative sampling device for a recommendation system according to the present invention.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
The method provided by the invention can be implemented in the following terminal environment, and the terminal can comprise one or more of the following components: a processor, a memory, and a display screen. Wherein the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the methods described in the embodiments described below.
A processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and calling data stored in the memory.
The Memory may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instructions.
The display screen is used for displaying user interfaces of all the application programs.
In addition, those skilled in the art will appreciate that the above-described terminal configurations are not intended to be limiting, and that the terminal may include more or fewer components, or some components may be combined, or a different arrangement of components. For example, the terminal further includes a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and other components, which are not described herein again.
Example one
As shown in fig. 1, an embodiment of the present invention provides a negative sampling method for a recommendation system, including:
s101, for each positive sample pair, starting from a node representing a user in the positive sample pair, sampling a first number of negative samples on a traversal path of a bipartite graph of a recommendation system, and generating a negative sample traversal set;
s102, acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method, and generating a negative sample candidate set;
s103, calculating the sampling weight of each negative sample in the negative sample candidate set and the similarity between each negative sample in the negative sample candidate set and the positive sample in the positive sample pair;
and S104, carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples, and constructing a corresponding negative sample pair set.
In the above method, the bipartite graph G of the recommendation system is (U, V, E), where U represents a set of | U | users, V represents a set of | V | commodities, and E represents a set of | E | edges between the set U and the set V. Depending on the recommendation scenario, U may represent a set of | U | e-commerce users, a set of academic platform users, a set of trembling users, etc., and V may represent a set of | V | goods, a set of academic articles, a set of short videos, etc. Wherein, the set of positive sample pairs may be formally defined as R { (u, v) }, uiRepresents a user, vjRepresents a commodity, eijRepresents uiAnd vjThere are edges of the interaction between them. The set of negative pairs of samples can be formally defined as Rn={(u,vn)},uiRepresents a user, vnjRepresents a commodity, einjRepresents uiAnd vnjThere are no edges of interaction between them.
In a specific recommendation scenario of the recommendation system, exposure information (a commodity interacted by one user u) can be divided into three types: a set of exposed and clicked on by the user, a set of exposed and not clicked on by the user, and a set of unexposed goods. Wherein the goods exposed and clicked by the user are positive samples, and the samples exposed but not clicked by the user or the goods not exposed are negative samples. The exposure information reflects whether a user has an interest in the exposed (or recommended) goods, and reflects the negative preference of the user from the side, which is helpful for mining effective negative samples in the negative sampling process.
As an example, FIG. 2 is a user-merchandise bipartite graph. In fig. 2, two users are included: zhang three (u)0) Li Si (u)1) Eight commodities: computer bag (v)0) Men's shirt (v)1) Mobile phone A (v)2) Computer (v)3) Hat (v)4) Mobile phone B (v)5) Book shelf (v)6) Intelligent watch (v)7). Wherein, the node connecting the solid line with the user node represents the exposure and the commodity clicked by the user, namely, the positiveA sample; nodes that are bordered by dashed lines to user nodes represent exposed merchandise that has not been clicked on by the user, and nodes that are not bordered by user nodes represent unexposed merchandise, i.e., negative examples. In fig. 2, the set of positive sample pairs R { (u)0,v2),(u0,v3),(u1,v1),(u1,v5),(u1,v7)}。
Step S101 is executed, and for each positive sample pair, a first number of negative samples are sampled on the traversal path of the bipartite graph of the recommendation system starting from the node representing the user in the positive sample pair, and a negative sample traversal set is generated.
Generating a traversal path of a bipartite graph of the recommendation system according to the following modes:
and for each step of traversal, adopting depth-first search (DFS) traversal according to the probability of omega, and adopting breadth-first search (BFS) traversal according to the probability of 1-omega.
Because depth-first search (DFS) tends to search deeper nodes (i.e., unexposed goods) and breadth-first search (BFS) tends to search surrounding nodes (i.e., exposed and clicked goods), the present invention combines the traversal methods of DFS and BFS, performs DFS traversal with a probability of ω, and performs BFS traversal with a probability of 1- ω.
Wherein the first number (for example, M) can be set according to experience and practical situations.
As an example, in the user-merchandise bipartite graph illustrated in FIG. 2, for example, the sample pair (u) is aligned0,v3) Negative sampling is performed from representing user u in the positive sample pair0Starting from the node, adopting a traversal mode combining DFS and BFS to Zhang Sanu to users on the graph0Sampling M-5 to generate a negative sample traversal set C(M)={v0,v1,v5,v1,v3}。
And step S102 is executed, a second number of negative samples are obtained from the negative sample traversal set based on a self-contrast approximation method, and a negative sample candidate set is generated.
Number in real worldWhereby usually only a limited number of positive samples can be sampled, e.g. from a positive sample distribution pdSampling T positive samples { u1,…,uTFrom the negative sample distribution pnSampling k.T negative samples { u'1,…,u’kTWhich will lead to a non-negligible empirical error. In order to minimize this error, the negative sampling strategy employed in the present invention is: the selection of negative samples exhibits a positive but sub-linear correlation with the positive sample distribution, defined as follows:
pn(u|v)∝pd(u|v)α,0<α<1
according to the negative sampling strategy, the invention adopts a self-contrast approximation method to approximate positive distribution, and the definition is as follows:
Figure BDA0002792718430000091
where E denotes the encoder given the vector representation for the learning node and θ is the parameter to be learned by the encoder.
The time complexity of O (n) required for sampling negative samples is omitted according to the above strategy, and the method is not practical for large-scale and even ultra-large-scale graphs. In order to accelerate the negative sampling process, the method further improves the speed of negative sampling by adopting a Metropolis-Hastings algorithm.
Based on a self-contrast approximation model, calculating the similarity of the user-commodity with the inner product of each commodity in the negative sample traversal set for the user, wherein the similarity of the user-commodity is represented by the inner product, the larger the inner product is, the closer the distance between the user-commodity and the negative sample traversal set is, the higher the similarity is, and selecting the negative sample v in the negative sample traversal setnProbability of p (v)nThe larger | u), the calculation formula is as follows:
Figure BDA0002792718430000101
in the above formula, u is a node representing a user, vjTraversing the negative examples in the set for the negative example, C(M)For negative sample traversal sets, 0<α<1。
Can then be based on each negative sample vnProbability of p (v)n| u) picks a second number of negative examples that are closest to the user.
Wherein the second number (for example, L) can be set according to experience and practical situations.
As an example, for example, in the example of FIG. 2, the user Zhang three (u) is first determined using the sampling strategy and the self-contrast approximation method described above0) Calculate it and traverse set C of negative samples(M)={v0,v1,v2,v1,v3Obtaining the inner product similarity of each commodity (sample) in the set C to obtain a negative sample traversal set C(M)={v0,v1,v5,v1,v3V for each negative sample (commodity)nProbability of p (v)n| u), then according to the probability p (v)nU) is user Zhang three (u)0) Selecting 3 negative samples (commodities) to generate a negative sample candidate set C(L)={v5,v0,v1}。
Step S103 is executed to calculate a sampling weight of each negative sample in the negative sample candidate set and a similarity between each negative sample in the negative sample candidate set and a positive sample in the positive sample pair.
It is obviously not scientific to directly sample the two commodities with equal probability that the negative sample candidate set generated in step S102 contains both the exposed and non-clicked commodities and the unexposed commodities. The invention adopts a simple and effective method to calculate the sampling weight, which specifically comprises the following steps:
calculating the number of samples exposed but not clicked by the user in the negative sample candidate set, and taking the value of the number as the sampling weight of the samples exposed but not clicked by the user;
and setting the sampling weight of the unexposed sample in the negative sample candidate set as a preset value. For example, set directly to 1.
As an example, such as, for example, following the above example, the negative sample candidate set C(L)={v5,v0,v1Contains exposed non-clicked item v0,v1And unexposed goods v5Wherein the number of exposed non-clicked products is 2, so that the product v0,v1Is beta (v) as a sampling weight0)=β(v1) The sampling weight of the unexposed commodity is set to 1, so the commodity v5Is beta (v) as a sampling weight2)=1。
The step S101-102 generates a personalized commodity set for each user, and does not process the commodities v in each user-commodity pair (u, v), for which different negative examples v should be selected for different interactive commodities vn. In the invention, the negative sample v in the negative sample candidate set is calculatednSimilarity q (v) with positive samples (commercial product v)n| v) for positive samples (commodity v) different negative samples v are selectednSpecifically, the similarity q (v) is calculated by the following formulan|v):
Figure BDA0002792718430000121
Wherein q (v)n| v) is the similarity of each negative sample in the negative sample candidate set to the positive sample, v is the positive sample in the positive sample pair, vjFor negative examples in the negative example candidate set, C(L)For a negative sample candidate set, σ represents a sigmoid function
Figure BDA0002792718430000122
As an example, for instance, following the above example, a positive sample pair (u)0,v3) V in (1)3Calculating the similarity q (v)n|v)。
And S104 is executed, negative sampling is carried out on the negative sample candidate set according to the sampling weight and the similarity, a third number of negative samples corresponding to the positive samples are obtained, and a corresponding negative sample pair set is constructed. The method specifically comprises the following steps:
for each negative sample v in the negative sample candidate setnAccording to β (v)n)·p(vn|u)·q(vn| v) calculation result selectionTaking a third number of negative samples;
wherein, beta (v)n) A sampling weight, p (v), for each negative sample in the candidate set of negative samplesn| u) is the probability of each negative example in the negative example candidate set, q (v)n| v) is the similarity of each negative sample in the candidate set of negative samples to a positive sample.
Wherein the third number may be set based on experience and practical circumstances.
Calculating to obtain beta (v)n)·p(vn|u)·q(vn| v), the most similar commodity conforming to the third quantity is selected as the negative sample (commodity) of (u, v).
As an example, for instance, in accordance with β (v) in the above examplen)·p(vn|u)·q(vn| v) for the negative sample candidate set C(L)={v5,v0,v1Carry out negative sampling, and finally select a commodity { v }5As positive sample pair (u)0,v3) To construct a set of negative sample pairs T { (u)0,v5)}。
In the negative sampling process, which samples are sampled as negative samples have important influence on the quality of vector representation and further have important influence on the performance of downstream tasks. In the method, a first number of negative samples are sampled on a traversal path of a bipartite graph of the recommendation system to generate a negative sample traversal set, so that the performance of the recommendation system is improved; and then, based on a self-contrast approximation method, acquiring a second number of negative samples from the negative sample traversal set to generate a negative sample candidate set, so that the performance of the recommendation system is further improved.
In practical application, the method provided by the invention is found to greatly improve the recommendation performance compared with a comparison method. For example, a negative sampling experiment is performed on a large-scale user-article browsing network (16,015 users, 44,175 articles, 3,284,734 browsing relations and 3,796,551 exposure non-click relations), and the number of negative samples of a positive sample pair is set to be 5. The experimental result shows that the effect of the method on the recommended task is far higher than that of other comparison methods (such as random sampling, dynamic sampling, counter sampling method and the like). For example, on the user-article browsing network data set, the method has a performance improvement of 4.6% over the optimal comparison method (only mining candidate negative samples) on the evaluation index of Recall @ 50. This indicates that:
the negative sampling strategy adopted by the invention can be used for mining candidate negative samples, and compared with random sampling, the performance of the method can be improved by 20% on the evaluation index of Recall @ 50.
Mining the negative examples of the positive example pairs using the exposure information may further lead to an increase in recommendation performance.
Example two
As shown in fig. 3, another aspect of the present invention further includes a functional module architecture completely corresponding to the foregoing method flow, that is, an embodiment of the present invention further provides a recommendation system-oriented negative sampling apparatus, including:
a negative sample traversal set generation module 301, configured to sample, for each positive sample pair, a first number of negative samples on a traversal path of a bipartite graph of the recommendation system from a node representing a user in the positive sample pair, and generate a negative sample traversal set;
a negative sample candidate set generating module 302, configured to obtain a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method, and generate a negative sample candidate set;
a calculating module 303, configured to calculate a sampling weight of each negative sample in the negative sample candidate set and a similarity between each negative sample in the negative sample candidate set and a positive sample in the positive sample pair;
a negative sample obtaining module 304, configured to perform negative sampling on the negative sample candidate set according to the sampling weight and the similarity, obtain a third number of negative samples corresponding to the positive samples, and construct a corresponding negative sample pair set.
The traversal path of the bipartite graph of the recommendation system is generated according to the following modes:
and for each step of traversal, adopting depth-first search (DFS) traversal according to the probability of omega, and adopting breadth-first search (BFS) traversal according to the probability of 1-omega.
In this embodiment, positive samples are samples that are exposed and clicked by the user, and negative samples are samples that are exposed but not clicked by the user or unexposed samples.
The obtaining a second number of negative examples from the negative example traversal set based on a self-contrast approximation method includes:
calculating each negative sample v in the negative sample traversal set by using the following formulanProbability of (2)
Figure BDA0002792718430000141
Where E is the given encoder for learning the vector representation of the node, θ is the parameter to be learned by the encoder, u is the node representing the user, vjTraversing the negative examples in the set for the negative example, C(M)For negative sample traversal sets, 0<α<1;
And selecting a second number of negative samples according to the probability.
Further, the calculating the sampling weight of each negative sample in the negative sample candidate set includes:
calculating the number of samples exposed but not clicked by the user in the negative sample candidate set, and taking the value of the number as the sampling weight of the samples exposed but not clicked by the user;
and setting the sampling weight of the unexposed sample in the negative sample candidate set as a preset value.
Further, each negative sample v in the negative sample candidate set is calculated as followsnSimilarity to positive samples:
Figure BDA0002792718430000151
wherein q (v)n| v) is the similarity of each negative sample in the negative sample candidate set and the positive sample, and v is the positive samplePositive sample in this pair, vjFor negative examples in the negative example candidate set, C(L)For a negative sample candidate set, σ represents a sigmoid function
Figure BDA0002792718430000152
Further, the negatively sampling the negative sample candidate set according to the sampling weight and the similarity includes:
for each negative sample v in the negative sample candidate setnAccording to β (v)n)·p(vn|u)·q(vnSelecting a third number of negative samples from the calculation result of | v);
wherein, beta (v)n) A sampling weight, p (v), for each negative sample in the candidate set of negative samplesn| u) is the probability of each negative example in the negative example candidate set, q (v)n| v) is the similarity of each negative sample in the candidate set of negative samples to a positive sample.
The device can be implemented by the negative sampling method for the recommendation system provided in the first embodiment, and specific implementation methods can be referred to the description in the first embodiment and are not described herein again.
The invention also provides a memory storing a plurality of instructions for implementing the method according to the first embodiment.
The invention also provides an electronic device comprising a processor and a memory connected to the processor, wherein the memory stores a plurality of instructions, and the instructions can be loaded and executed by the processor to enable the processor to execute the method according to the first embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A recommendation system-oriented negative sampling method is characterized by comprising the following steps:
for each positive sample pair, starting from a node representing a user in the positive sample pair, sampling a first number of negative samples on a traversal path of a bipartite graph of a recommendation system, and generating a negative sample traversal set;
acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method to generate a negative sample candidate set;
calculating the sampling weight of each negative sample in the negative sample candidate set and the similarity between each negative sample in the negative sample candidate set and the positive sample in the positive sample pair;
and carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples, and constructing a corresponding negative sample pair set.
2. The recommendation-system-oriented negative sampling method of claim 1, wherein the traversal path of the bipartite graph of the recommendation system is generated in the following manner:
and for each step of traversal, adopting depth-first search (DFS) traversal according to the probability of omega, and adopting breadth-first search (BFS) traversal according to the probability of 1-omega.
3. The recommendation-oriented system negative sampling method of claim 1, wherein a positive sample is a sample that is exposed and clicked by the user, and a negative sample is a sample that is exposed but not clicked by the user or an unexposed sample.
4. The recommendation-system-oriented negative sampling method of claim 3, wherein the obtaining a second number of negative examples from the negative example traversal set based on a self-contrast approximation comprises:
calculating each negative sample v in the negative sample traversal set by using the following formulanProbability of (2)
Figure FDA0002792718420000021
Where E is the given encoder for learning the vector representation of the node, θ is the parameter to be learned by the encoder, u is the node representing the user, vjTraversing the negative examples in the set for the negative example, C(M)The negative sample traversal set is represented, and alpha is more than 0 and less than 1;
and selecting a second number of negative samples according to the probability.
5. The recommendation-oriented system negative sampling method of claim 4, wherein the calculating the sampling weight of each negative sample in the negative sample candidate set comprises:
calculating the number of samples exposed but not clicked by the user in the negative sample candidate set, and taking the value of the number as the sampling weight of the samples exposed but not clicked by the user;
and setting the sampling weight of the unexposed sample in the negative sample candidate set as a preset value.
6. The recommendation-oriented negative sampling method of claim 5, wherein each negative sample v in the candidate set of negative samples is calculated as followsnSimilarity to positive samples:
Figure FDA0002792718420000022
wherein q (v)n| v) is the similarity of each negative sample in the negative sample candidate set to the positive sample, v is the positive sample in the positive sample pair, vjFor negative examples in the negative example candidate set, C(L)For a negative sample candidate set, σ represents a sigmoid function
Figure FDA0002792718420000023
7. The recommendation-oriented system negative-sampling method of claim 1, wherein the negative-sampling the candidate set of negative samples according to the sampling weights and similarities comprises:
for each negative sample v in the negative sample candidate setnAccording to β (v)n)·p(vn|u)·q(vnSelecting a third number of negative samples from the calculation result of | v);
wherein, beta (v)n) A sampling weight, p (v), for each negative sample in the candidate set of negative samplesn| u) is the probability of each negative example in the negative example candidate set, q (v)n| v) is the similarity of each negative sample in the candidate set of negative samples to a positive sample.
8. A recommendation system oriented negative sampling device, comprising:
the negative sample traversal set generation module is used for sampling a first number of negative samples on a traversal path of a bipartite graph of the recommendation system from a node representing a user in each positive sample pair to generate a negative sample traversal set;
the negative sample candidate set generation module is used for acquiring a second number of negative samples from the negative sample traversal set based on a self-contrast approximation method to generate a negative sample candidate set;
a calculating module, configured to calculate a sampling weight of each negative sample in the negative sample candidate set, and a similarity between each negative sample in the negative sample candidate set and a positive sample in the positive sample pair;
and the negative sample acquisition module is used for carrying out negative sampling on the negative sample candidate set according to the sampling weight and the similarity to obtain a third number of negative samples corresponding to the positive samples and construct a corresponding negative sample pair set.
9. A memory storing a plurality of instructions for implementing the method of any one of claims 1-7.
10. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method according to any of claims 1-7.
CN202011320456.8A 2020-11-23 2020-11-23 Recommendation system-oriented negative sampling method and device and electronic equipment Active CN112464647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011320456.8A CN112464647B (en) 2020-11-23 2020-11-23 Recommendation system-oriented negative sampling method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011320456.8A CN112464647B (en) 2020-11-23 2020-11-23 Recommendation system-oriented negative sampling method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112464647A true CN112464647A (en) 2021-03-09
CN112464647B CN112464647B (en) 2021-10-19

Family

ID=74798451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011320456.8A Active CN112464647B (en) 2020-11-23 2020-11-23 Recommendation system-oriented negative sampling method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112464647B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112148A (en) * 2021-04-09 2021-07-13 北京邮电大学 Evaluation method for evaluation result of recommendation system model and electronic equipment
CN114201603A (en) * 2021-11-04 2022-03-18 阿里巴巴(中国)有限公司 Entity classification method, device, storage medium, processor and electronic device
CN114491283A (en) * 2022-04-02 2022-05-13 浙江口碑网络技术有限公司 Object recommendation method and device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241412A (en) * 2018-08-17 2019-01-18 深圳先进技术研究院 A kind of recommended method, system and electronic equipment based on network representation study
CN111681067A (en) * 2020-04-17 2020-09-18 清华大学 Long-tail commodity recommendation method and system based on graph attention network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241412A (en) * 2018-08-17 2019-01-18 深圳先进技术研究院 A kind of recommended method, system and electronic equipment based on network representation study
CN111681067A (en) * 2020-04-17 2020-09-18 清华大学 Long-tail commodity recommendation method and system based on graph attention network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANG WANG ET. AL: "Reinforced Negative Sampling over Knowledge Graph for Recommendation", 《ARXIV》 *
ZHEN YANG ET. AL: "Understanding Negative Sampling in Graph Representation Learning", 《ARXIV》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113112148A (en) * 2021-04-09 2021-07-13 北京邮电大学 Evaluation method for evaluation result of recommendation system model and electronic equipment
CN113112148B (en) * 2021-04-09 2022-08-05 北京邮电大学 Evaluation method for evaluation result of recommendation system model and electronic equipment
CN114201603A (en) * 2021-11-04 2022-03-18 阿里巴巴(中国)有限公司 Entity classification method, device, storage medium, processor and electronic device
CN114491283A (en) * 2022-04-02 2022-05-13 浙江口碑网络技术有限公司 Object recommendation method and device and electronic equipment
CN114491283B (en) * 2022-04-02 2022-07-22 浙江口碑网络技术有限公司 Object recommendation method and device and electronic equipment

Also Published As

Publication number Publication date
CN112464647B (en) 2021-10-19

Similar Documents

Publication Publication Date Title
CN112464647B (en) Recommendation system-oriented negative sampling method and device and electronic equipment
CN109241412B (en) Recommendation method and system based on network representation learning and electronic equipment
KR20200109230A (en) Method and apparatus for generating neural network
EP4181026A1 (en) Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium
US10185983B2 (en) Least-ask: conversational recommender system with minimized user interaction
JP2019008742A (en) Learning device, generation device, learning method, generation method, learning program, and generation program
Sun et al. APL: Adversarial pairwise learning for recommender systems
CN111625645B (en) Training method and device for text generation model and electronic equipment
CN111291618A (en) Labeling method, device, server and storage medium
CN109509051B (en) Article recommendation method and device
Song et al. Cold-start aware deep memory networks for multi-entity aspect-based sentiment analysis
WO2024012360A1 (en) Data processing method and related apparatus
JP6971181B2 (en) Predictors, predictors, and programs
CN116910357A (en) Data processing method and related device
CN116186326A (en) Video recommendation method, model training method, electronic device and storage medium
CN116843022A (en) Data processing method and related device
CN116049567A (en) Collaborative filtering-based fault inspection recommendation method and system
CN113641915B (en) Object recommendation method, device, equipment, storage medium and program product
Chou et al. Pseudo-reward algorithms for contextual bandits with linear payoff functions
CN112100507B (en) Object recommendation method, computing device and computer-readable storage medium
CN111159558B (en) Recommendation list generation method and device and electronic equipment
CN115482021A (en) Multimedia information recommendation method and device, electronic equipment and storage medium
CN112905885A (en) Method, apparatus, device, medium, and program product for recommending resources to a user
CN116485505B (en) Method and device for training recommendation model based on user performance fairness
CN117112890A (en) Data processing method, contribution value acquisition method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210324

Address after: B201d-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing 100083

Applicant after: Beijing innovation Zhiyuan Technology Co.,Ltd.

Address before: B201d-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing 100083

Applicant before: Beijing Zhiyuan Artificial Intelligence Research Institute

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210621

Address after: 603a, 6th floor, building 6, yard 1, Zhongguancun East Road, Haidian District, Beijing 100083

Applicant after: Beijing Zhipu Huazhang Technology Co.,Ltd.

Address before: B201d-1, 3rd floor, building 8, yard 1, Zhongguancun East Road, Haidian District, Beijing 100083

Applicant before: Beijing innovation Zhiyuan Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Yang Zhen

Inventor after: Ding Ming

Inventor after: Shao Zhou

Inventor after: Liu Debing

Inventor after: Zhang Peng

Inventor before: Yang Zhen

Inventor before: Ding Ming

Inventor before: Shao Zhou

Inventor before: Liu Debing

Inventor before: Zhang Peng

Inventor before: Tang Jie