CN113836417A - Negative sample determination method and device, electronic equipment and storage medium - Google Patents

Negative sample determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113836417A
CN113836417A CN202111131095.7A CN202111131095A CN113836417A CN 113836417 A CN113836417 A CN 113836417A CN 202111131095 A CN202111131095 A CN 202111131095A CN 113836417 A CN113836417 A CN 113836417A
Authority
CN
China
Prior art keywords
search behavior
behavior information
relationship
search
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111131095.7A
Other languages
Chinese (zh)
Inventor
黄腾玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing IQIYI Science and Technology Co Ltd
Original Assignee
Beijing IQIYI Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing IQIYI Science and Technology Co Ltd filed Critical Beijing IQIYI Science and Technology Co Ltd
Priority to CN202111131095.7A priority Critical patent/CN113836417A/en
Publication of CN113836417A publication Critical patent/CN113836417A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention provides a method and a device for determining a negative sample, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring target searching behavior information as a positive sample; determining a next node having an association relation with a current node according to a pre-recorded search behavior relation by taking a positive sample as a starting point until a preset number of target nodes are away from the starting point, wherein the search behavior relation is determined based on search behavior information in a plurality of pre-acquired user search behavior sequences and is used for representing the association relation among the search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample. The search behavior relation represents the degree of correlation between the pieces of search behavior information, so that the search behavior information with the correlation not too strong or too weak with the positive sample can be found based on the search behavior relation, the search behavior information can meet the requirement of the negative sample, and the accurate and valuable negative sample can be obtained.

Description

Negative sample determination method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of sample processing technologies, and in particular, to a method and an apparatus for determining a negative sample, an electronic device, and a storage medium.
Background
In application scenarios such as video search, information search, advertisement recommendation, video recommendation, and commodity recommendation, a deep learning model is usually required to be used, and a training sample determines an upper limit of performance of the deep learning model to be trained. For the negative sample, a certain correlation is required to be provided between the negative sample and the corresponding positive sample, and the correlation cannot be too strong or too weak, so that the accurate and valuable negative sample is difficult to obtain, and the mining of the negative sample becomes a key point and a difficulty point of research.
At present, a way of obtaining negative examples exists, and for an information search scenario, a specific process is to first calculate a similarity between feature vectors corresponding to pieces of search behavior information, and then select, based on the similarity, a example with a similarity that is neither too high nor too low as a negative example.
However, in the above-mentioned method for obtaining negative examples, the similarity between every two feature vectors needs to be calculated, and a very suitable similarity threshold needs to be determined, so that the problems of difficulty in selecting the similarity threshold and high calculation complexity exist, and the problem of how to obtain accurate and valuable negative examples cannot be solved well.
Disclosure of Invention
The embodiment of the invention aims to provide a negative sample determination method, a negative sample determination device, electronic equipment and a storage medium, so as to obtain an accurate and valuable negative sample. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for determining a negative example, where the method includes:
acquiring target searching behavior information as a positive sample;
determining a next node having an association relationship with a current node according to a pre-recorded search behavior relationship by taking the positive sample as a starting point until a preset number of target nodes are away from the starting point, wherein the search behavior relationship is determined based on search behavior information in a plurality of pre-acquired user search behavior sequences and is used for representing the association relationship among the search behavior information;
and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample.
In a second aspect, an embodiment of the present invention provides an apparatus for determining a negative example, where the apparatus includes:
the positive sample acquisition module is used for acquiring target search behavior information as a positive sample;
the information migration module is used for determining a next node which has an association relationship with a current node according to a pre-recorded search behavior relationship by taking the positive sample as a starting point until a preset number of target nodes are away from the starting point, wherein the search behavior relationship is determined by the relationship establishment module based on search behavior information in a plurality of pre-acquired user search behavior sequences and is used for representing the association relationship among the search behavior information;
and the negative sample determining module is used for determining the searching behavior information corresponding to the target node as the negative sample corresponding to the positive sample.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor, configured to implement the method steps for determining a negative example according to any one of the first aspect described above when executing a program stored in a memory.
In a fourth aspect, an embodiment of the present invention provides a computer-readable cache medium, in which a computer program is cached, and when the computer program is executed by a processor, the steps of the method for determining a negative example according to any one of the first aspect above are implemented.
In the scheme provided by the embodiment of the invention, the electronic equipment can obtain target search behavior information as a positive sample, and determines the next node having an association relationship with the current node according to a pre-recorded search behavior relationship by taking the positive sample as a starting point until the target nodes with a preset number of distances from the starting point, wherein the search behavior relationship is determined based on the search behavior information in a plurality of pre-obtained user search behavior sequences and is used for representing the association relationship among the search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample.
Because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample found in sequence is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the incidence relation with the positive sample, which is not too strong or too weak, can be found, and the accurate and valuable negative sample just needs the incidence relation with the positive sample, which is not too strong or too weak, therefore, the searching behavior information can meet the requirement of the negative sample, can be used as a negative sample corresponding to the positive sample, thus obtaining an accurate and valuable negative sample.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a flow chart of a method for determining negative examples according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating the step S102 in the embodiment shown in FIG. 1;
FIG. 3 is a flowchart illustrating a specific step S201 in the embodiment shown in FIG. 2;
FIG. 4 is a schematic diagram of a relationship diagram based on the embodiment shown in FIG. 1;
FIG. 5 is another detailed flowchart of step S201 in the embodiment shown in FIG. 2;
FIG. 6 is a flowchart of a method for establishing a search behavior relationship according to the embodiment shown in FIG. 1;
fig. 7 is a schematic structural diagram of a negative example determining apparatus according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a specific structure of the node determining module 720 in the embodiment shown in fig. 7;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to obtain an accurate and valuable negative sample, the embodiment of the invention provides a method and a device for determining the negative sample, an electronic device, a computer-readable storage medium and a computer program product. The following describes a method for determining a negative example according to an embodiment of the present invention.
The method for determining the negative sample provided by the embodiment of the present invention can be applied to any electronic device that needs to determine the negative sample, for example, an electronic device such as a computer, a tablet computer, and a processor, and is not limited specifically herein.
As shown in fig. 1, a method for determining a negative example, the method includes:
s101, acquiring target searching behavior information as a positive sample.
And S102, determining the next node having the association relation with the current node according to the pre-recorded search behavior relation by taking the positive sample as a starting point until a preset number of target nodes are away from the starting point.
The searching behavior relation is determined based on searching behavior information in a plurality of user searching behavior sequences acquired in advance and is used for representing the incidence relation among the searching behavior information.
S103, determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample.
In application scenarios such as video search, information search, news search, advertisement recommendation, video recommendation, commodity recommendation, news recommendation, and the like, information such as advertisements, videos, news, commodities, and the like related to user recommendation based on search behavior information items input by users is required. In this process, the recommendation of the search information needs to be realized based on a deep learning model capable of determining the search information recommended to the user, wherein the deep learning model is trained based on the positive samples and the corresponding negative samples.
In order to obtain accurate search information, a deep learning model with excellent performance is required, and a positive sample and a negative sample which are accurate and valuable are also required. Reliable positive examples are generally relatively easy to obtain, e.g., search behavior information entered by a user can be used as positive examples, but accurate and valuable negative examples are difficult to obtain.
In order to obtain accurate and valuable negative examples corresponding to each positive example, in step S101, the electronic device may obtain target search behavior information and use the target search behavior information as a positive example. The target search behavior information may be one of the search behavior information in a user search behavior sequence acquired in advance. The user search behavior sequence comprises a plurality of pieces of search behavior information input by the user in a time period, and the target search behavior information is one of the pieces of search behavior information, namely words or phrases which are input by the user at a certain time point in the time period and are wanted to be searched.
For example, for a sequence of user search behaviors: laugh-doggie-kitten, then 3 search behavior information are included: "fun", "puppy", and "kitten", in which each search behavior information can be regarded as a positive sample. If the electronic equipment acquires the search behavior information 'puppy', the 'puppy' is the target search behavior information. Further, the target search behavior information "puppy" may be determined as a positive sample.
Next, the electronic device may determine a next node having an association relationship with the current node according to the pre-recorded search behavior by wandering from the positive sample as a starting point until a preset number of target nodes are away from the starting point, that is, the step S102 is performed. The search behavior relationship may be determined based on search behavior information (item) in a plurality of pre-obtained user search behavior sequences, and the search behavior relationship is used to represent an association relationship between the respective search behavior information.
The embodiment of the present invention is not particularly limited, as long as the relationship between the pieces of search behavior information can be accurately represented, for example, the search behavior relationship between the pieces of search behavior information recorded in the form of a matrix, a relationship graph, or the like may be used.
With the above user search behavior sequence: for example, a small dog and a small cat are taken as examples, the correlation relationship between the pieces of search behavior information is as follows: "it" and "puppy" are adjacent search behavior information, and "puppy" and "kitten" are adjacent search behavior information, and "puppy" exists as one search behavior information between "it" and "kitten", so the degree of association between "it" and "puppy" and "kitten" is higher, and the degree of association between "it" and "kitten" is lower.
If there is another sequence of user search behavior: the search behavior information "fun" and "star a" also have adjacent association relationship, and the association degree between "fun" and "star a" is also higher. According to such a rule, a search behavior relationship capable of representing the association relationship between the respective search behavior information can be determined for the search behavior information in the plurality of user search behavior sequences acquired in advance.
In one embodiment, the electronic device may take the positive sample as a starting point, perform a wandering process according to the search behavior relationship, and wander away one search behavior information at a time, so that after a preset number of wandering processes, a search behavior information having a certain distance from the positive sample, that is, a preset number of target nodes having a distance from the starting point, is reached. The preset number may be set according to the actual requirement of the negative sample and the application scenario, and may be, for example, a numerical value of 5 to 8, and is not limited herein.
After the target nodes with the preset number of distances from the starting point are determined, the degree of association between the search behavior information corresponding to the target nodes and the positive samples at this time meets the requirements of being neither too high nor too low, and therefore, the electronic device may determine the search behavior information corresponding to the target nodes as the negative samples corresponding to the positive samples, i.e., execute step S103.
In the scheme provided by the embodiment of the invention, because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users, and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample sequentially found is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the correlation with the positive sample can be found, and the accurate and valuable negative sample just needs the correlation with the positive sample, namely not too strong or too weak, therefore, the search behavior information can meet the requirement of the negative sample and can be used as the negative sample corresponding to the positive sample, and thus, an accurate and valuable negative sample can be obtained. And constructing a behavior item relation based on the user searching behavior sequence, and collecting a negative sample based on the behavior item relation. The user searching behavior sequence is very important data, the data volume is large, and rich related information is hidden. The scheme of the embodiment makes full use of the user search behavior sequence, is an important key for making search recommendation, and can achieve a very good recommendation effect. Aiming at the advertisement recommendation scene, the effects of a recommendation system and an advertisement system can be improved, the recommendation and advertisement user experience is improved, and the income and experience improvement is realized.
As an implementation manner of the embodiment of the present invention, as shown in fig. 2, the step of determining, by using the positive sample as a starting point and according to a pre-recorded search behavior relationship, a next node having an association relationship with a current node until a preset number of target nodes are away from the starting point may include:
s201, with the positive sample as a starting point, determining a next node having an association relation with the current node based on a pre-recorded relation weight and a search behavior relation between a plurality of search behavior information.
Since there may be a plurality of adjacent search behavior information having an association relationship with each search behavior information in the above search behavior relationship, the electronic device may record a relationship weight in advance in order to be able to smoothly determine a next node having an association relationship with a current node. Wherein, the relation weight is used for expressing the degree of association between two adjacent search behavior information.
In one embodiment, the electronic device may determine search behavior information having a largest or smallest relationship weight between search behavior information corresponding to the current node as a next node having an association relationship with the current node. In another embodiment, the electronic device may also determine that the relationship weight between the search behavior information corresponding to the current node is neither the maximum nor the minimum search behavior information as the next node having an association relationship with the current node, which is reasonable, and may specifically be determined according to the setting manner of the relationship weight,
for example, in the walk-around relationship, the search behavior information a as a positive sample has 3 pieces of adjacent search behavior information, which are search behavior information b, search behavior information c, and search behavior information d, respectively. The corresponding relationship weights are 1, 3, and 2, respectively, and then the electronic device may determine that the search behavior information whose relationship weight is neither the maximum nor the minimum is the next node having an association relationship with the search behavior information a, that is, the search behavior information d is the next node.
S202, taking the searching behavior information corresponding to the next node as a starting point, and returning to the step of determining the next node having the association relation with the current node based on the pre-recorded relation weight until the current node is away from the starting point by a preset number of target nodes.
After determining the next node, the electronic device may use the search behavior information of the next node as a new starting point, return to the step of determining the next node having an association relationship with the current node based on the pre-recorded relationship weight, and continue to determine the next search behavior information.
And after the training times reach the preset times, determining a next node, namely a target node, by the electronic equipment at the moment, wherein the target node is a node with a preset distance from the starting point, and the searching behavior information corresponding to the target node is the searching behavior information which has the relevance degree with the positive sample which is not too high or too low and can be used as a negative sample corresponding to the positive sample.
As can be seen, in this embodiment, the electronic device may use the positive sample as a starting point, determine, based on the pre-recorded relationship weight and the search behavior relationship between the plurality of search behavior information, a next node having an association relationship with the current node, further use the search behavior information corresponding to the next node as the current node, and return to the step of determining, according to the pre-recorded search behavior relationship, the next node having an association relationship with the current node until the current node is distant from the starting point by the preset number of target nodes. Therefore, the negative sample corresponding to the positive sample can be accurately determined based on the relation weight, and the accuracy of the negative sample is further improved. And constructing a difficult-to-load sample based on the relation graph migration, and resisting noise in a user search behavior sequence by means of a relation graph structure, so that the quality of the difficult-to-load sample is improved, and the search and popularization effects are finally improved.
As an implementation manner of the embodiment of the present invention, the search behavior relationship may be a matrix, and an element in the matrix is a relationship weight between two adjacent pieces of search behavior information. In this case, the electronic device may record the association relationship and the association weight between the pieces of search behavior information using a matrix.
For example, three user search behavior sequences are acquired in advance, and are respectively the user search behavior sequence 1: search behavior information a 1-search behavior information a 2-search behavior information A3, user search behavior sequence 2: search behavior information a 1-search behavior information B2-search behavior information B3, user search behavior sequence 3: search behavior information C1-search behavior information B2-search behavior information B3. Since the adjacent search behavior information in the user search behavior sequence is the search behavior information continuously input by the user and has the closest association relationship, the relationship weight between the adjacent search behavior information is high and can be recorded as 1.
The electronic device may determine that the relationship weights between the search behavior information a1 and the search behavior information a2, the search behavior information a2 and the search behavior information A3, the search behavior information a1 and the search behavior information B2, and the search behavior information C1 — the search behavior information B2 are all 1; the search behavior information B2 and the search behavior information B3 are adjacent in the two user search behavior sequences, so the relationship between the search behavior information B2 and the search behavior information B3 is more compact, and the relationship weight can be recorded as 2; the search behavior information a1 and the search behavior information A3, the search behavior information a1 and the search behavior information B3, the search behavior information a1 and the search behavior information C1, and the search behavior information C1 and the search behavior information B3 are all not adjacent in any user search behavior sequence, so the weight average of the relationship weights between them can be set to 0.
In summary, the relationship weight between any two pieces of search behavior information is related to whether these pieces of search behavior information are adjacent in the current sequence and other sequences, and the association condition of the sequences.
Based on this, the electronic device can construct a matrix identifying the search behavior relationship among the plurality of pieces of search behavior information in the user search behavior sequence
Figure BDA0003280513300000081
Wherein, element 1 of the first row and the first column may represent a relationship weight between the search behavior information a1 and the search behavior information a 2; element 0 of the first row and the second column may represent a relationship weight between the search behavior information a1 and the search behavior information A3; element 1 of the first row and the third column may represent a relationship weight between the search behavior information a2 and the search behavior information A3. Element 1 of the second row and the first column may represent a relationship weight between the search behavior information a1 and the search behavior information B2; element 0 of the second row and the second column may represent a relationship weight between the search behavior information a1 and the search behavior information B3; element 2 of the second row and the third column may represent a relationship weight between the search behavior information B2 and the search behavior information B3. Element 1 of the third row and the first column may represent a relationship weight between the search behavior information C1 and the search behavior information B2; element 0 of the third row and the second column may represent a relationship weight between the search behavior information C1 and the search behavior information B3; the element 0 of the third row and the third column may represent a relationship weight between the search behavior information C1 and the search behavior information a 1.
For the case that the search behavior relationship is a matrix, as shown in fig. 3, the step of determining a next node having an association relationship with the current node based on the pre-recorded relationship weight and the search behavior relationship between the plurality of search behavior information may include:
s301, determining the largest element in the elements representing the target relation weight as a target element;
and the target relation weight is the relation weight between the search behavior information corresponding to the starting point and the adjacent search behavior information. Then, the maximum element in the elements representing the target relationship weight has a higher degree of association between the corresponding search behavior information and the search behavior information corresponding to the starting point, and the search behavior information corresponding to other smaller elements in the elements representing the target relationship weight has a low degree of association with the search behavior information corresponding to the starting point, or even has no association, so in order to ensure that the finally determined negative sample can meet the requirement, the search behavior information with a higher degree of association can be selected as the next search behavior information each time the next node is determined, so that after a preset number of target nodes away from the starting point are found, the search behavior information with a moderate degree of association with the positive sample can be found.
Therefore, in this step, the electronic device may find the largest element among the elements representing the target relationship weights and determine the largest element as the target element. For example, taking the above matrix as an example, if the positive sample is the search behavior information B2, the electronic device may determine, based on the matrix, that the elements representing the target relationship weights are element 1 of the second row and the first column, element 2 of the second row and the third column, and element 1 of the third row and the first column, respectively, where the largest element is 2, and may determine element 2 as the target element.
S302, determining another piece of search behavior information except the starting point corresponding to the relation weight represented by the target element as a next node having an association relation with the current node.
After determining the target element, the electronic device may determine another piece of search behavior information, other than the starting point, corresponding to the relationship weight represented by the target element, as a next node having an association relationship with the current node, that is, next search behavior information.
Still taking the above example as an example, if the relationship weight represented by the target element 2 is the relationship weight between the search behavior information B2 and the search behavior information B3, and the search behavior information B2 is the starting point, the electronic device may determine the search behavior information B3 as the next node having an association relationship with the current node.
As can be seen, in this embodiment, for the case that the search behavior relationship is a matrix, the electronic device may determine, as the target element, the largest element among the elements representing the target relationship weight, and further determine, as the next node having an association relationship with the current node, another search behavior information corresponding to the relationship weight represented by the target element, except for the start point. Therefore, the electronic equipment can accurately determine the next node having the incidence relation with the current node based on the matrix so as to ensure that the negative sample can be successfully determined finally and the accuracy of the negative sample is ensured.
As an implementation manner of the embodiment of the present invention, the search behavior relationship may be a relationship graph, where each node in the relationship graph represents one piece of search behavior information, and a connection line is provided between the nodes to indicate that two pieces of search behavior information corresponding to two nodes connected by the connection line have an association relationship.
In this case, the electronic device may record the association relationship between the pieces of search behavior information using the relationship graph, and record the association weight between the pieces of search behavior information. Still with the above three user search behavior sequences 1: search behavior information a 1-search behavior information a 2-search behavior information A3, user search behavior sequence 2: search behavior information a 1-search behavior information B2-search behavior information B3 and user search behavior sequence 3: the search behavior information C1-search behavior information B2-search behavior information B3 are examples.
The electronic device may construct a relationship diagram as shown in fig. 4 according to the association relationship among the search behavior information in the user search behavior sequence, where there are connection lines between the node a1 and the node a2, the node a2 and the node A3, the node a1 and the node B2, the node B2 and the node B3, and the node C1 and the node B2, and the connection lines indicate that there is an association relationship between the search behavior information a1 and the search behavior information a2, between the search behavior information a2 and the search behavior information A3, between the search behavior information a1 and the search behavior information B2, between the search behavior information B2 and the search behavior information B3, and between the search behavior information C1 and the search behavior information B2.
And there are no connecting lines between the node a1 and the node A3, between the node a1 and the node B3, between the node a1 and the node C1, and between the node C1 and the node B3, which indicate that there is no association between the search behavior information a1 and the search behavior information A3, between the search behavior information a1 and the search behavior information B3, between the search behavior information a1 and the search behavior information C1, and between the search behavior information C1 and the search behavior information B3.
For the case that the wandering relationship is a relationship graph, as shown in fig. 5, the step of determining a next node having an association relationship with the current node based on the pre-recorded relationship weight and the search behavior relationship between the plurality of search behavior information may include:
s501, taking the search behavior information represented by the node with the connecting line with the starting point as alternative search behavior information;
the node corresponding to a certain search behavior information has a connection line with the starting point, which indicates that the search behavior information has an association relationship with the search behavior information corresponding to the starting point, so that the search behavior information may become the next search behavior information to be walked, so that the electronic device may find all nodes having connection lines with the starting point, and determine the search behavior information represented by the nodes as alternative search behavior information.
For example, as shown in the relationship diagram of fig. 4, if the starting point is node a1, since both node a2 and node B2 have a connection line with node a1, indicating that the search behavior information corresponding to node a2 and node B2 has an association relationship with the search behavior information corresponding to node a1, the electronic device may determine the search behavior information corresponding to node a2 and node B2 as the candidate search behavior information.
And S502, determining the corresponding candidate searching behavior information with the maximum pre-recorded relation weight as the next node with the incidence relation with the current node.
After determining the candidate search behavior information, the electronic device may search for a relationship weight corresponding to each of the pre-recorded candidate search behavior information. Since the relationship weight indicates that the degree of association between the search behavior information corresponding to the node and the search behavior information corresponding to the starting point is high, and the degree of association between the search behavior information corresponding to the node with the low relationship weight and the search behavior information corresponding to the starting point is low, in order to ensure that the finally determined negative sample can meet the requirement, the search behavior information with the high degree of association can be selected as the next search behavior information, namely the next node, when the next node is determined each time, so that the search behavior information with the moderate degree of association with the positive sample can be found after the preset number of times.
Therefore, the electronic device finds the maximum relationship weight from the candidate search behavior information, and determines the node corresponding to the candidate search behavior information corresponding to the maximum relationship weight as the walking direction. For example, after the electronic device determines the search behavior information corresponding to the node a2 and the node B2 as the candidate search behavior information, if the relationship weights between the search behavior information corresponding to the node a2 and the node B2 and the search behavior information corresponding to the starting point are 1 and 3, respectively, the electronic device may determine the node B2 as the next node.
In one case, if the weight of the relationship between all the candidate search behavior information and the search behavior information corresponding to the starting point is the same, the electronic device may randomly select a node corresponding to one candidate search behavior information as a next node.
As an implementation manner, the length of the connection line may also be used to identify a relationship weight between two pieces of search behavior information connected by the connection line, and then the electronic device may also use a node corresponding to the candidate search behavior information corresponding to the shortest or longest connection line as a next node. Alternatively, the identification value of the connection line may also be used to identify the relationship weight between the two pieces of search behavior information connected by the connection line. The identification value may be directly labeled on the relationship diagram, or may be recorded on the relationship diagram by an attribute or labeling manner.
As can be seen, in this embodiment, for the case that the search behavior relationship is a relationship graph, the electronic device may determine, as the candidate search behavior information, the search behavior information represented by the node having the connection line with the starting point, and further determine, as the next node having the association relationship with the current node, the candidate search behavior information having the largest corresponding pre-recorded relationship weight. Therefore, the electronic equipment can accurately determine the next node having the incidence relation with the current node based on the relation graph so as to ensure that the negative sample can be successfully determined finally and the accuracy of the negative sample is ensured.
As an implementation manner of the embodiment of the present invention, the relationship weight may be determined in advance based on a degree of association between search behavior information having an association relationship in the search behavior relationship, where the degree of association is determined according to an order in which the search behavior information is input by a user when performing information search.
The search behavior sequence includes a user inputting a plurality of search behavior information arranged in a time sequence in which the user inputs it. The closer the time when the user inputs two pieces of search behavior information, the higher the degree of association between the two pieces of search behavior information.
For example, if the user inputs "make a fun" at time 1 to search for a funny video about a puppy, the user is interested in looking at the video, and then inputs "puppy" to search for a video related to the puppy, then the "make a fun" and "puppy" are adjacent search behavior information in the search behavior sequence corresponding to the user, and the degree of association between the "make a fun" and the "puppy" is high.
Therefore, the relation weight between the search behavior information can be determined in advance based on the degree of association between the search behavior information having the association relation in the search behavior relation, and the degree of association can be determined according to the sequence of the search behavior information input by the user when information search is performed, so that the relation weight of the degree of association between the search behavior information can be accurately identified.
As an implementation manner of the embodiment of the present invention, as shown in fig. 6, the establishment manner of the search behavior relationship may include:
s601, collecting a plurality of user searching behavior sequences;
the electronic device may collect a plurality of user search behavior sequences, which may be user search behavior sequences of a plurality of different users, wherein each user search behavior sequence includes a plurality of search behavior information, i.e., search behavior information describing user inputs within a target time interval. The plurality of pieces of search behavior information are arranged in chronological order in which the user inputs them, that is, positions of search behavior information input temporally adjacent in the user search behavior sequence are also adjacent.
S602, traversing each user searching behavior sequence, and establishing an incidence relation between searching behavior information adjacent to each other at every two input times;
to determine the association between search behavior information, the electronic device may traverse each sequence of user search behaviors, encounter two input temporally adjacent search behavior information during the traversal, and may record the association between the two.
For example, in traversing a sequence of user behaviors: in the process of searching the behavior information a, the searching behavior information b and the searching behavior information c, when the searching behavior information a is traversed to the searching behavior information b, the electronic device can record the association relationship between the searching behavior information a and the searching behavior information b because the searching behavior information a and the searching behavior information b are adjacent searching behavior information with input time. Similarly, when traversing to the search behavior information c, the electronic device may record the association relationship between the search behavior information b and the search behavior information c because the search behavior information b and the search behavior information c are search behavior information whose input times are adjacent to each other.
In an embodiment, the step of establishing an association relationship between search behavior information adjacent to each other every two input times may include:
establishing a corresponding node of each piece of search behavior information in the relation graph; and establishing a connecting line between two corresponding nodes aiming at the search behavior information adjacent to each other every two input times.
If the incidence relation between the search behavior information is recorded in a relational graph mode, the electronic equipment can establish the node 1 when traversing the search behavior information a, and then can establish the node 2 when continuously traversing to the search behavior information b, and because the search behavior information a and the search behavior information b are adjacent search behavior information, the electronic equipment can connect the node 1 and the node 2 by using a connecting line, and the incidence relation between the search behavior information a and the search behavior information b is recorded.
Similarly, when traversing to the search behavior information c, the electronic device may establish the node 3, and since the search behavior information b and the search behavior information c are adjacent search behavior information, the electronic device may connect the node 2 and the node 3 by using a connection line to record an association relationship between the search behavior information b and the search behavior information c.
If different user search behavior sequences include the same search behavior information, the same search behavior information corresponds to the same node in the relationship graph, for example, if both the user search behavior sequence 1 and the user search behavior sequence 2 include the search behavior information a, the electronic device may establish the node 1, where the node 1 corresponds to both the search behavior information a in the user search behavior sequence 1 and the search behavior information a in the user search behavior sequence 2, that is, the same search behavior information corresponds to the same node in the relationship graph.
In another embodiment, the step of establishing an association relationship between search behavior information adjacent to each other every two input times may include:
and recording corresponding elements of the search behavior information adjacent to each other every two input times in the matrix.
If the incidence relation between the search behavior information is recorded in a matrix mode, the electronic equipment can record the elements corresponding to the search behavior information adjacent to the two input times in the matrix when traversing the search behavior information adjacent to the two input times.
For example, still taking the above example as an example, since the search behavior information b and the search behavior information c are adjacent search behavior information, the electronic device may record elements corresponding to the search behavior information b and the search behavior information c in the matrix, for example, the elements may be first row and second column elements in the matrix, and the specific element position may be set arbitrarily.
And S603, increasing the preset value of the relation weight corresponding to the search behavior information adjacent to the two input times when the search behavior information adjacent to the two input times is traversed until all the search behavior sequences of the user are traversed, and obtaining the search behavior relation.
If the input time of two pieces of search behavior information is adjacent, the degree of association between the two pieces of search behavior information is high, and if the two pieces of search behavior information are adjacent in a plurality of user search behavior sequences, the degree of association between the two pieces of search behavior information is very high. Therefore, in order to accurately determine the relationship weight between two pieces of search behavior information that are input time-adjacent, the electronic device may increase the relationship weight corresponding to the two pieces of search behavior information that are input time-adjacent by a preset value every time the electronic device traverses the two pieces of search behavior information that are input time-adjacent.
The initial value of the relationship weight may be initialized randomly, for example, may be 0, and is not limited in this respect. The preset value may be determined according to the number of search behavior information and other factors, and may be, for example, 1, 2, 1.5, and the like, which are not specifically limited herein.
For example, after recording the association relationship between the search behavior information a and the search behavior information b, the electronic device may increase the weight of the relationship between the search behavior information a and the search behavior information b by 1, and when traversing to the search behavior information b after traversing to the search behavior information a again, may increase the weight of the relationship between the search behavior information a and the search behavior information b by 1 again.
When the search behavior relationship is a relationship graph, the electronic device may increase, every time two pieces of search behavior information adjacent in input time are traversed, the relationship weight corresponding to the connection line corresponding to the two pieces of search behavior information adjacent in input time by a preset value.
For example, after the electronic device establishes a connection line between nodes corresponding to the search behavior information a and the search behavior information b, the relationship weight corresponding to the connection line may be increased by 1, and when traversing to the search behavior information b after traversing to the search behavior information a again, the relationship weight may be increased by 1 again.
In the case that the search behavior relationship is a matrix, the electronic device may increase, every time two pieces of search behavior information whose input times are adjacent are traversed, a value of an element corresponding to the two pieces of search behavior information whose input times are adjacent in the matrix by a preset value.
For example, after recording the element 1 corresponding to the search behavior information a and the search behavior information b in the matrix, the electronic device may set the value of the element 1 to 1, and when traversing to the search behavior information b after traversing to the search behavior information a again, may increase the value of the element 1 by 1.
And traversing according to the mode until all the user searching behavior sequences are traversed, so that the searching behavior relation can be obtained.
It can be seen that, in this embodiment, the electronic device may collect a plurality of user search behavior sequences, traverse each user search behavior sequence, record an association relationship between search behavior information adjacent to each two input times, and increase a preset value for a relationship weight corresponding to the search behavior information adjacent to each two input times when traversing the search behavior information adjacent to each two input times until all the user search behavior sequences traverse, so as to obtain a search behavior relationship. Therefore, the searching behavior relation can be conveniently and accurately determined, and accurate and valuable negative samples can be obtained based on the searching behavior relation subsequently.
As an implementation manner of the embodiment of the present invention, on the basis of the method described in any of the above embodiments, the method may further include:
and training a preset model based on the positive sample and the negative sample to obtain a deep learning model for recommending search information to a user.
After a large number of positive samples and negative samples corresponding to the positive samples are obtained, the electronic device can train the preset model based on the positive samples and the negative samples, and then a deep learning model for recommending search information to a user is obtained. The preset model may be a deep learning model for recommending information such as videos, news, commodities, advertisements and the like to a user, and may be determined according to an actual application scenario, which is not specifically limited herein.
The specific structure of the preset model and the like are not specifically limited, and can be selected according to the actual use scene. When the preset model is trained by using the positive sample and the corresponding negative sample, a gradient descent algorithm, a random gradient descent algorithm, and the like can be adopted, and no specific limitation and description are made herein.
Therefore, in this embodiment, the electronic device may train the preset model based on the positive sample and the negative sample to obtain a deep learning model for recommending search information to the user. The negative sample determined by the method provided by the embodiment of the invention is accurate and valuable and meets the requirement of the negative sample, so the deep learning model obtained by training the positive sample and the negative sample has good performance and good effect of recommending search information to a user.
Corresponding to the method for determining the negative sample, the embodiment of the invention also provides a device for determining the negative sample. The following describes a negative example determination device according to an embodiment of the present invention.
As shown in fig. 7, an apparatus for determining a negative example, the apparatus comprising:
a positive sample obtaining module 710, configured to obtain target search behavior information as a positive sample;
a node determining module 720, configured to determine, using the positive sample as a starting point, a next node having an association relationship with a current node according to a pre-recorded search behavior relationship until a preset number of target nodes are away from the starting point;
the searching behavior relationship is determined by the relationship establishing module based on searching behavior information in a plurality of pre-acquired user searching behavior sequences and is used for representing the relationship between the searching behavior information.
A negative sample determining module 730, configured to determine the search behavior information corresponding to the target node as a negative sample corresponding to the positive sample.
As can be seen, in the scheme provided by the embodiment of the present invention, the electronic device may obtain target search behavior information as a positive sample, determine, according to a pre-recorded search behavior relationship, a next node having an association relationship with a current node, with the positive sample as a starting point, until a preset number of target nodes are distant from the starting point, where the search behavior relationship is determined based on the search behavior information in a plurality of pre-obtained user search behavior sequences, and is used to represent the association relationship between the pieces of search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample. Because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample found in sequence is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the incidence relation with the positive sample, which is not too strong or too weak, can be found, and the accurate and valuable negative sample just needs the incidence relation with the positive sample, which is not too strong or too weak, therefore, the searching behavior information can meet the requirement of the negative sample, can be used as a negative sample corresponding to the positive sample, thus obtaining an accurate and valuable negative sample.
As an implementation manner of the embodiment of the present invention, as shown in fig. 8, the node determining module 720 may include:
a node determination unit 721 configured to determine, based on a pre-recorded relationship weight and a search behavior relationship between a plurality of pieces of search behavior information, a next node having an association relationship with a current node, with the positive sample as a starting point;
wherein the relationship weight is used for representing the degree of association between two adjacent pieces of search behavior information.
A starting point updating unit 722, configured to use the search behavior information corresponding to the next node as a current node, and trigger the node determining unit 721 until the current node is distant from the starting point by a preset number of target nodes.
As an implementation manner of the embodiment of the present invention, the search behavior relationship may be a matrix, where an element in the matrix is a relationship weight between two adjacent pieces of search behavior information;
the node determination unit 721 may include:
a target element determination subunit configured to determine, as a target element, a largest element among the elements representing the target relationship weight;
and the target relation weight is the relation weight between the search behavior information corresponding to the starting point and the adjacent search behavior information.
And the first node determining subunit is configured to determine another piece of search behavior information, other than the starting point, corresponding to the relationship weight represented by the target element, as a next node having an association relationship with the current node.
As an implementation manner of the embodiment of the present invention, the search behavior relationship may be a relationship graph, each node in the relationship graph represents one search behavior information, and a connection line is arranged between the nodes to represent that two search behavior information corresponding to two nodes connected by the connection line have an association relationship;
the node determination unit 721 may include:
a candidate information determination subunit, configured to use search behavior information represented by a node having a connection line with the start point as candidate search behavior information;
and the second node determining subunit is used for determining the corresponding candidate searching behavior information with the maximum pre-recorded relationship weight as the next node with the incidence relationship with the current node.
As an implementation manner of the embodiment of the present invention, the relationship weight may be determined in advance based on a degree of association between search behavior information having an association relationship in the search behavior relationship, where the degree of association is determined according to an order in which the search behavior information is input by a user when performing information search.
As an implementation manner of the embodiment of the present invention, the relationship establishing module may include:
a behavior sequence collection unit for collecting a plurality of user search behavior sequences;
wherein each user search behavior sequence is used for describing search behavior information input by the user in a target time interval.
The incidence relation establishing unit is used for traversing each user searching behavior sequence and establishing the incidence relation between every two pieces of searching behavior information which are adjacent in time;
and the relation weight recording unit is used for increasing the relation weight corresponding to the search behavior information adjacent to the two input times by a preset value when the search behavior information adjacent to the two input times is traversed until all the search behavior sequences of the users are traversed, so that the search behavior relation is obtained.
As an implementation manner of the embodiment of the present invention, the association relationship establishing unit includes:
the first relation establishing subunit is used for recording corresponding elements of the search behavior information adjacent to each other in every two input times in the matrix;
the relationship weight recording unit includes:
and the first weight recording subunit is used for increasing the values of corresponding elements of the two pieces of search behavior information adjacent to the input time in the matrix by preset values every time the two pieces of search behavior information adjacent to the input time are traversed.
As an implementation manner of the embodiment of the present invention, the association relationship establishing unit includes:
the second relation establishing subunit is used for establishing a node corresponding to each piece of search behavior information in the relation graph; establishing a connecting line between two corresponding nodes aiming at the search behavior information adjacent to each other every two input times;
the relationship weight recording unit includes:
and the second weight recording subunit is used for increasing the relationship weight corresponding to the connecting line corresponding to the search behavior information adjacent to the two input times by a preset value every time the search behavior information adjacent to the two input times is traversed.
As an implementation manner of the embodiment of the present invention, the apparatus may further include:
and the model training module is used for training a preset model based on the positive sample and the negative sample to obtain a deep learning model for recommending search information to a user.
An embodiment of the present invention further provides an electronic device, as shown in fig. 9, which includes a processor 901, a communication interface 902, a memory 903, and a communication bus 904, where the processor 901, the communication interface 902, and the memory 903 complete mutual communication through the communication bus 904,
a memory 903 for storing computer programs;
the processor 901 is configured to implement the steps of the negative example determination method according to any of the above embodiments when executing the program stored in the memory 903.
As can be seen, in the scheme provided by the embodiment of the present invention, the electronic device may obtain target search behavior information as a positive sample, determine, according to a pre-recorded search behavior relationship, a next node having an association relationship with a current node, with the positive sample as a starting point, until a preset number of target nodes are distant from the starting point, where the search behavior relationship is determined based on the search behavior information in a plurality of pre-obtained user search behavior sequences, and is used to represent the association relationship between the pieces of search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample. Because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample found in sequence is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the incidence relation with the positive sample, which is not too strong or too weak, can be found, and the accurate and valuable negative sample just needs the incidence relation with the positive sample, which is not too strong or too weak, therefore, the searching behavior information can meet the requirement of the negative sample, can be used as a negative sample corresponding to the positive sample, thus obtaining an accurate and valuable negative sample.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one cache device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In yet another embodiment of the present invention, a computer-readable cache medium is further provided, in which instructions are cached, and when the instructions are executed on a computer, the instructions cause the computer to execute the method for determining a negative example in any of the above embodiments.
It can be seen that, in the solution provided in the embodiment of the present invention, when running on a computer, an instruction cached in a computer-readable cache medium may obtain target search behavior information, as a positive sample, and determine, with the positive sample as a starting point, a next node having an association relationship with a current node according to a pre-recorded search behavior relationship until reaching a preset number of target nodes away from the starting point, where the search behavior relationship is determined based on search behavior information in a plurality of pre-obtained user search behavior sequences, and is used to represent an association relationship between each piece of search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample. Because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample found in sequence is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the incidence relation with the positive sample, which is not too strong or too weak, can be found, and the accurate and valuable negative sample just needs the incidence relation with the positive sample, which is not too strong or too weak, therefore, the searching behavior information can meet the requirement of the negative sample, can be used as a negative sample corresponding to the positive sample, thus obtaining an accurate and valuable negative sample.
In a further embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of determining negative examples as described in any of the above embodiments.
It can be seen that, in the solution provided in the embodiment of the present invention, when a computer program product runs on a computer, target search behavior information may be obtained as a positive sample, and a next node having an association relationship with a current node is determined according to a pre-recorded search behavior relationship with the positive sample as a starting point until a preset number of target nodes are distant from the starting point, where the search behavior relationship is determined based on search behavior information in a plurality of pre-obtained user search behavior sequences, and is used to represent the association relationship between each search behavior information; and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample. Because the searching behavior relation is determined based on the searching behavior information in the pre-acquired searching behavior sequences of a plurality of users and can represent the incidence relation among the searching behavior information, the next node having the incidence relation with the current node is determined to be the node having the strongest incidence relation with the positive sample according to the searching behavior relation, the incidence relation between the next node and the positive sample found in sequence is gradually weakened until a preset number of target nodes away from the starting point are found, the target nodes are target nodes having a certain incidence relation with the positive sample, the incidence relation is not too strong or too weak, the searching behavior information having the incidence relation with the positive sample, which is not too strong or too weak, can be found, and the accurate and valuable negative sample just needs the incidence relation with the positive sample, which is not too strong or too weak, therefore, the searching behavior information can meet the requirement of the negative sample, can be used as a negative sample corresponding to the positive sample, thus obtaining an accurate and valuable negative sample.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be cached in a computer-readable cache medium or transmitted from one computer-readable cache medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable cache medium may be any available medium that can be accessed by a computer or a data caching device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A method for determining negative examples, the method comprising:
acquiring target searching behavior information as a positive sample;
determining a next node having an association relationship with a current node according to a pre-recorded search behavior relationship by taking the positive sample as a starting point until a preset number of target nodes are away from the starting point, wherein the search behavior relationship is determined based on search behavior information in a plurality of pre-acquired user search behavior sequences and is used for representing the association relationship among the search behavior information;
and determining the searching behavior information corresponding to the target node as a negative sample corresponding to the positive sample.
2. The method of claim 1, wherein the step of determining a next node having an association relationship with a current node according to a pre-recorded search behavior relationship with the positive sample as a starting point until a preset number of target nodes away from the starting point comprises:
determining a next node having an association relationship with a current node based on a pre-recorded relationship weight and a search behavior relationship between a plurality of pieces of search behavior information by taking the positive sample as a starting point, wherein the relationship weight is used for representing the association degree between two adjacent pieces of search behavior information;
and taking the searching behavior information corresponding to the next node as the current node, and returning to the step of determining the next node having the association relation with the current node according to the pre-recorded searching behavior relation until the current node is away from the starting point by a preset number of target nodes.
3. The method of claim 2, wherein the search behavior relationship is a matrix, and an element in the matrix is a relationship weight between two adjacent search behavior information;
the step of determining a next node having an association relationship with the current node based on the pre-recorded relationship weight and the search behavior relationship between the plurality of search behavior information includes:
determining the largest element in the elements representing the target relation weight as a target element, wherein the target relation weight is the relation weight between the search behavior information corresponding to the starting point and the adjacent search behavior information;
and determining another piece of search behavior information except the starting point corresponding to the relation weight represented by the target element as a next node having an association relation with the current node.
4. The method according to claim 2, wherein the search behavior relationship is a relationship graph, each node in the relationship graph represents one search behavior information, and a connection line between the nodes represents that two search behavior information corresponding to two nodes connected by the connection line have an association relationship;
the step of determining a next node having an association relationship with the current node based on the pre-recorded relationship weight and the search behavior relationship between the plurality of search behavior information includes:
taking the search behavior information represented by the node having the connecting line with the starting point as alternative search behavior information;
and determining the candidate searching behavior information with the maximum corresponding pre-recorded relation weight as the next node with the incidence relation with the current node.
5. The method according to any one of claims 2 to 4, wherein the relationship weight is determined in advance based on a degree of association between search behavior information having an association relationship in the search behavior relationship, the degree of association being determined according to an order in which the search behavior information is input by a user when performing an information search.
6. The method of claim 5, wherein the search behavior relationship is established in a manner that comprises:
collecting a plurality of user search behavior sequences, wherein each user search behavior sequence is used for describing search behavior information input by the user in a target time interval;
traversing each user searching behavior sequence, and establishing an incidence relation between searching behavior information adjacent to each other at every two input times;
and every time two search behavior information with adjacent input time are traversed, a preset value is added to the corresponding relation weight of the search behavior information with adjacent input time until all the search behavior sequences of the users are traversed, and the search behavior relation is obtained.
7. The method of claim 6, wherein the step of establishing an association between every two input time-adjacent search behavior information comprises:
recording corresponding elements of search behavior information adjacent to each other in every two input times in the matrix;
the step of increasing the weight of the relationship corresponding to the search behavior information adjacent to the two input times by a preset value every time two search behavior information adjacent to the input times are traversed comprises the following steps:
and increasing the values of the corresponding elements of the two pieces of search behavior information adjacent in input time in the matrix by preset values every time two pieces of search behavior information adjacent in input time are traversed.
8. The method of claim 6, wherein the step of establishing an association between every two input time-adjacent search behavior information comprises:
establishing a corresponding node of each piece of search behavior information in the relation graph;
establishing a connecting line between two corresponding nodes aiming at the search behavior information adjacent to each other every two input times;
the step of increasing the weight of the relationship corresponding to the search behavior information adjacent to the two input times by a preset value every time two search behavior information adjacent to the input times are traversed comprises the following steps:
and increasing the weight of the relationship corresponding to the connecting line corresponding to the search behavior information adjacent to the two input times by a preset value every time the search behavior information adjacent to the two input times is traversed.
9. The method of any one of claims 1-4, further comprising:
and training a preset model based on the positive sample and the negative sample to obtain a deep learning model for recommending search information to a user.
10. An apparatus for determining a negative example, the apparatus comprising:
the positive sample acquisition module is used for acquiring target search behavior information as a positive sample;
the information migration module is used for determining a next node which has an association relationship with a current node according to a pre-recorded search behavior relationship by taking the positive sample as a starting point until a preset number of target nodes are away from the starting point, wherein the search behavior relationship is determined by the relationship establishment module based on search behavior information in a plurality of pre-acquired user search behavior sequences and is used for representing the association relationship among the search behavior information;
and the negative sample determining module is used for determining the searching behavior information corresponding to the target node as the negative sample corresponding to the positive sample.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-9 when executing a program stored in the memory.
12. A computer-readable cache medium, in which a computer program is cached, which, when being executed by a processor, carries out the method steps of any one of claims 1 to 9.
CN202111131095.7A 2021-09-26 2021-09-26 Negative sample determination method and device, electronic equipment and storage medium Pending CN113836417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111131095.7A CN113836417A (en) 2021-09-26 2021-09-26 Negative sample determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111131095.7A CN113836417A (en) 2021-09-26 2021-09-26 Negative sample determination method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113836417A true CN113836417A (en) 2021-12-24

Family

ID=78970225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111131095.7A Pending CN113836417A (en) 2021-09-26 2021-09-26 Negative sample determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113836417A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950324A (en) * 2021-03-15 2021-06-11 重庆邮电大学 Knowledge graph assisted pairwise sorting personalized merchant recommendation method and system
US20210216561A1 (en) * 2020-08-21 2021-07-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Information search method and apparatus, device and storage medium
CN113342995A (en) * 2021-07-05 2021-09-03 成都信息工程大学 Negative sample extraction method based on path semantics and feature extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210216561A1 (en) * 2020-08-21 2021-07-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Information search method and apparatus, device and storage medium
CN112950324A (en) * 2021-03-15 2021-06-11 重庆邮电大学 Knowledge graph assisted pairwise sorting personalized merchant recommendation method and system
CN113342995A (en) * 2021-07-05 2021-09-03 成都信息工程大学 Negative sample extraction method based on path semantics and feature extraction

Similar Documents

Publication Publication Date Title
US10671679B2 (en) Method and system for enhanced content recommendation
AU2009347535B2 (en) Co-selected image classification
CN110569496B (en) Entity linking method, device and storage medium
CN109189990B (en) Search word generation method and device and electronic equipment
CN111666448B (en) Search method, search device, electronic equipment and computer readable storage medium
CN111008321A (en) Recommendation method and device based on logistic regression, computing equipment and readable storage medium
CN108153909B (en) Keyword putting word-expanding method and device, electronic equipment and storage medium
CN111666450A (en) Video recall method and device, electronic equipment and computer-readable storage medium
JP6728178B2 (en) Method and apparatus for processing search data
CN110674345A (en) Video searching method and device and server
CN111506820A (en) Recommendation model, method, device, equipment and storage medium
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN108319628B (en) User interest determination method and device
CN111708909A (en) Video tag adding method and device, electronic equipment and computer-readable storage medium
CN115687690A (en) Video recommendation method and device, electronic equipment and storage medium
KR20200049193A (en) Method for providing contents and service device supporting the same
CN115374362A (en) Multi-way recall model training method, multi-way recall device and electronic equipment
CN113220974A (en) Click rate prediction model training and search recall method, device, equipment and medium
CN112836126A (en) Recommendation method and device based on knowledge graph, electronic equipment and storage medium
CN116610872B (en) Training method and device for news recommendation model
CN113282831A (en) Search information recommendation method and device, electronic equipment and storage medium
CN110120918B (en) Identification analysis method and device
CN108596647B (en) Advertisement putting method and device and electronic equipment
KR20180067976A (en) Method for movie ratings prediction using sentiment analysis of movie tags, recording medium and device for performing the method
CN113836417A (en) Negative sample determination method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination