CN113704596A - Method and apparatus for generating a set of recall information - Google Patents
Method and apparatus for generating a set of recall information Download PDFInfo
- Publication number
- CN113704596A CN113704596A CN202010434711.5A CN202010434711A CN113704596A CN 113704596 A CN113704596 A CN 113704596A CN 202010434711 A CN202010434711 A CN 202010434711A CN 113704596 A CN113704596 A CN 113704596A
- Authority
- CN
- China
- Prior art keywords
- information
- user
- sequence
- browsing
- historical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000013598 vector Substances 0.000 claims abstract description 118
- 238000012549 training Methods 0.000 claims description 45
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000001737 promoting effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 235000021178 picnic Nutrition 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 239000008213 purified water Substances 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present disclosure disclose methods and apparatus for generating a set of recall information. One embodiment of the method comprises: acquiring a user browsing information sequence, wherein the user browsing information comprises browsed information identifiers arranged according to user browsing time; inputting the user browsing information sequence into a pre-trained information association model to generate an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set take vectors with the same form as the information association vector as indexes; and selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence. The implementation mode effectively improves the coverage rate of the recall information.
Description
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating a set of recall information.
Background
With the development of big data technology and artificial intelligence technology, higher and higher requirements are put forward on the recall algorithm of the recommendation system.
In the prior art, a collaborative filtering or vectorization recall method is generally adopted. Because a collaborative filtering algorithm based on user or recommended information usually requires enough users with interactive history for the same information or requires a co-occurrence relationship between information, information recalled by the method usually has insufficient information diversity and limited coverage rate because the information does not belong to full-scale retrieval. For the traditional vectorization recall method, because the similarity calculation is directly carried out on the user vector and the information vector in the full-scale information base and the recall information is returned according to the similarity, a large amount of long-tail information can be recalled.
Disclosure of Invention
Embodiments of the present disclosure propose methods and apparatuses for generating a set of recall information.
In a first aspect, an embodiment of the present disclosure provides a method for generating a set of recall information, the method including: acquiring a user browsing information sequence, wherein the user browsing information comprises browsed information identifiers arranged according to user browsing time; inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vectors from a preset information cluster set, wherein the information clusters in the information cluster set take the vectors with the same form as the information association vectors as indexes; and selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence.
In some embodiments, the selecting a first number of matched information clusters from a preset information cluster set according to an information association vector includes: acquiring an index vector corresponding to the centroid of each information cluster in the information cluster set; selecting a first number of index vectors according to the similarity of the index vectors and the information association vectors; and determining the information clusters corresponding to the selected first number of index vectors as matched information clusters.
In some embodiments, the information association model is obtained by training the following steps: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises browsed information identifiers arranged according to user browsing time, the user history browsing information sequence comprises a sample label and a corresponding sample sequence, the sample label comprises a non-first element in the user history browsing information sequence, and the sample sequence comprises a subsequence formed by elements positioned in front of the sample label in the user history browsing information sequence; and taking the sample sequence of the training sample as the input of the information correlation model, taking the sample label corresponding to the input sample sequence as the expected output of the information correlation model, and training to obtain the information correlation model.
In some embodiments, the preset information cluster set is obtained by: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting the user historical browsing information sequences in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; and clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as an information cluster set.
In some embodiments, the selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence includes: for the selected first number of matched information clusters, counting the historical browsing times of the information in each information cluster; and selecting a second number of information from at least a plurality of sequences according to the historical browsing times of the information, and generating a recall information set matched with the browsing information sequence of the user.
In a second aspect, an embodiment of the present disclosure provides an apparatus for generating a set of recall information, the apparatus comprising: the information acquisition unit is configured to acquire a user browsing information sequence, wherein the user browsing information comprises an identifier of browsed information arranged according to user browsing time; the generating unit is configured to input the user browsing information sequence into a pre-trained information association model and generate an information association vector corresponding to the user browsing information sequence; the information cluster selecting unit is configured to select a first number of matched information clusters matched with the information association vectors from a preset information cluster set, wherein the information clusters in the information cluster set take vectors in the form consistent with the information association vectors as indexes; and the recalling unit is configured to select a second number of information from the first number of matched information clusters and generate a recalling information set matched with the user browsing information sequence.
In some embodiments, the selecting unit includes: the acquisition module is configured to acquire an index vector corresponding to the centroid of each information cluster in the information cluster set; a selecting module configured to select a first number of index vectors according to similarity between the index vectors and the information association vectors; and the determining module is configured to determine the information clusters corresponding to the selected first number of index vectors as matched information clusters.
In some embodiments, the information association model is obtained by training the following steps: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises browsed information identifiers arranged according to user browsing time, the user history browsing information sequence comprises a sample label and a corresponding sample sequence, the sample label comprises a non-first element in the user history browsing information sequence, and the sample sequence comprises a subsequence formed by elements positioned in front of the sample label in the user history browsing information sequence; and taking the sample sequence of the training sample as the input of the information correlation model, taking the sample label corresponding to the input sample sequence as the expected output of the information correlation model, and training to obtain the information correlation model.
In some embodiments, the preset information cluster set is obtained by: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting the user historical browsing information sequences in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; and clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as an information cluster set.
In some embodiments, the recall unit includes: the statistical module is configured to count the historical browsing times of the information in each information cluster for the selected first number of matched information clusters; and the generating module is configured to select a second number of information from at least multiple sequences according to the historical browsing times of the information, and generate a recall information set matched with the browsing information sequence of the user.
In a third aspect, an embodiment of the present disclosure provides a server, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which when executed by a processor implements the method as described in any of the implementations of the first aspect.
According to the method and the device for generating the recall information set, the alternative information is divided into the information clusters, the information clusters are selected according to the matching among the vectors, the condition that user interaction or commodity co-occurrence is not used as a retrieval premise is not needed, and the coverage rate of the recall information is effectively improved. Moreover, the recall information is generated by performing secondary selection from the matched information clusters, so that a large amount of long-tail information is prevented from being recalled due to single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby promoting the application effect of the recall model on the whole.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for generating a set of recall information according to the present disclosure;
FIG. 3 is a schematic diagram of one application scenario of a method for generating a set of recall information in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a method for generating a set of recall information according to the present disclosure;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for generating a set of recall information according to the present disclosure;
FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary architecture 100 to which the disclosed method for generating a set of recall information or apparatus for generating a set of recall information may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a web browser application, a shopping-type application, a search-type application, an instant messaging tool, a mailbox client, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having display screens and interacting with information, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for information displayed on the terminal devices 101, 102, 103. The background server may analyze the received user browsing information sequence, and may also feed back the generated recall information set to the terminal device according to the generated recall information set (for example, information that the user may be interested in).
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating the recall information set provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the method for generating the recall information set is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a set of recall information in accordance with the present disclosure is shown. The method for generating a set of recall information comprises the steps of:
In this embodiment, the execution subject (such as the server 105 shown in fig. 1) of the method for generating the recall information set may acquire the user browsing information sequence by a wired connection manner or a wireless connection manner. The user browsing information may include an identifier of browsed information arranged according to the user browsing time. The browsed information may include various information, such as information and merchandise information, displayed to the user by the terminal. The user browsing information sequence may include, for example, a sequence of identifiers (item ids) of the commodity information browsed by the user in sequence within a preset time period (e.g., 10 minutes). The above-mentioned identification of the merchandise information may include various character strings for uniquely identifying the displayed merchandise, which is not described herein again.
In this embodiment, as an example, the execution main body may acquire a user browsing information sequence stored locally in advance, or may acquire the user browsing information sequence from an electronic device (for example, terminal devices 101, 102, 103 shown in fig. 1) in communication connection therewith.
It should be noted that the above "browsing" may also include dividing into more detailed dimensions. For example, the browsing duration is greater than a preset duration threshold, and for example, the browsing duration may also refer to that a click operation (e.g., "like", "comment", "join favorite", etc.) is performed on the browsed page.
In this embodiment, the executing body may input the user browsing information sequence acquired in step 201 to a pre-trained information association model, and generate an information association vector corresponding to the user browsing information sequence. The information association model can be used for representing the corresponding relation between the information association vector and the user browsing information sequence. The information correlation model may be various models that can generate feature vectors obtained through training by a machine learning method, such as a Recurrent Neural Networks (RNN), a Long-Short Term Memory network (LSTM), and the like.
In some optional implementation manners of this embodiment, the information association model may be obtained by training through the following steps:
in a first step, a set of training samples is obtained.
In these implementations, an executive for training the information association model described above may first obtain a set of training samples. The training samples in the training sample set may include a user history browsing information sequence. The user historical browsing information may include an identifier of the browsed information arranged according to the browsing time of the user. The user history browsing information sequence may include a sample tag and a corresponding sample sequence. The sample tag may include a non-first element in the user's historical browsing information sequence. The sample sequence may include a subsequence of elements preceding the sample tag in the user history browsing information sequence.
In these implementations, as an example, the user has clicked pages describing item a, item F, item H, and item S in sequence within 10 minutes. The training sample may be (article a, article F, article H, article S). The sample label may be "item S". The sample sequence corresponding to the sample label may be (product a, product F, product H). A large number of training samples may be collected by a method similar to that described above, thereby forming the set of training samples.
And secondly, taking the sample sequence of the training sample as the input of the information association model, taking the sample label corresponding to the input sample sequence as the expected output of the information association model, and training to obtain the information association model.
In these implementations, the executing entity for training the information association model may input the sample sequence of the training samples obtained in the first step to the initial information association model, so as to obtain a network output. The initial information association model may include various RNNs, such as LSTM, GRU (gated recurrent neural network). The execution agent may then compare the network output to the sample tags corresponding to the input sample sequence. And adjusting the network parameters of the initial information correlation network based on the comparison result, and stopping training when the training end condition is met. Then, the executing agent may determine the initial information association model after the training and the network parameter adjustment as the information association model.
Based on the optional implementation manner, the execution main body can improve the accuracy of the generated information association vector through training of the information association model, and provide a good data base for subsequent information recall.
In this embodiment, the executing entity may select a first number of matched information clusters matched with the information association vector obtained in step 202 from a preset information cluster set in various ways. The information clusters in the information cluster set are usually indexed by vectors in accordance with the information association vector form. In this embodiment, the preset information cluster set may include a plurality of information clusters. Wherein each of the clusters of information generally corresponds to a respective index vector. The index vector is typically in the same form as the information association vector, e.g., the same dimension.
In this embodiment, as an example, the execution subject may select a first number of matched information clusters matched with the information association vector from a preset information cluster set through a K-Nearest Neighbor (KNN) classification algorithm. As another example, the executing entity may compare the similarity between the information association vector obtained in step 202 and the index vector corresponding to the information cluster in the preset information cluster set. Then, the execution body may select a first number of matched information clusters in an order from high to low in similarity. The first number may be any number preset. The first number may be a number according to a rule, for example, the number of information clusters having a similarity exceeding a similarity threshold.
In some optional implementation manners of this embodiment, the executing body may further select a first number of matched information clusters matched with the information association vector from a preset information cluster set by:
firstly, obtaining an index vector corresponding to the centroid of each information cluster in the information cluster set.
In these implementations, the executing body may further first obtain an index vector corresponding to a centroid of each information cluster in the information cluster set. As an example, each piece of information in the information cluster set may correspond to a feature vector. The execution body may determine a feature vector corresponding to a centroid of each information cluster as an index vector corresponding to each information cluster.
And secondly, selecting a first number of index vectors according to the similarity of the index vectors and the information association vectors.
In these implementations, the executing entity may select a first number of index vectors according to the similarity between the index vector obtained in the first step and the information association vector, in a manner similar to that in step 203.
And thirdly, determining the information clusters corresponding to the selected first number of index vectors as matched information clusters.
Based on the optional implementation manner, the execution main body can utilize the index vector corresponding to the centroid of each information cluster as a retrieval basis of the matched information cluster, so that the selection efficiency and accuracy of the matched information cluster are improved.
In some optional implementation manners of this embodiment, based on the information association model obtained through the training in the training step, the preset information cluster set may be obtained through the following steps:
firstly, acquiring a user historical browsing information sequence set.
In these implementations, the execution subject for constructing the information cluster set may acquire the user history browsing information sequence set in various ways. The user history browsing information may include an identifier of the history browsed information arranged according to the user browsing time.
It should be noted that the training sample set may be directly used for the user historical browsing information. The user historical browsing information may also be different from the training sample set.
And secondly, inputting the user historical browsing information sequences in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set.
In these implementations, the execution subject for constructing the information cluster set may input the user historical browsing information sequence in the user historical browsing information sequence set acquired in the first step to the trained information association model, so as to generate a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set. The historical related information in the historical related information set can be obtained from an output layer of the information related model. The form of the historical association information is generally consistent with the sample label. The historical information association vector in the historical information association vector set may be obtained from a hidden layer of the information association model. The history information association vector may correspond to the history association information.
And thirdly, clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as an information cluster set.
In these implementations, the execution subject may first cluster the set of history information association vectors generated in the second step in various ways. The clustering method may include a kmeans (k-means) algorithm, for example. Then, the execution subject may determine a history related information set corresponding to the clustered history information related vector set as an information cluster set.
Based on the optional implementation manner, the execution main body may construct an information cluster set according to the historical information association vector set obtained by using the information association model. Therefore, the information in the information cluster can be automatically classified through the characteristics learned by the model fused with the user-based collaborative filtering idea, and the method has higher accuracy and flexibility compared with manual classification.
And 204, selecting a second number of information from the first number of matched information clusters, and generating a recall information set matched with the user browsing information sequence.
In this embodiment, the execution subject may select a second number of information from the first number of matched information clusters selected in step 203 in various ways, so as to generate a recall information set matched with the browsing information sequence of the user. As an example, the executing entity may select a second number of information from the first number of matched information clusters selected in step 203 in order of high to low similarity with the information association vector generated in step 202, so as to form the recall information set matched with the user browsing information sequence. The second number may be any number preset. The second number may be a number according to a rule, for example, a number of pieces of information whose similarity exceeds a similarity threshold. As another example, the execution subject may further select a total second number of pieces of information from each information cluster according to a number relation that the index vector corresponding to the information cluster is consistent with the similarity ranking of the information association vectorsAnd the more information is selected from the information clusters corresponding to the index vectors with higher similarity. E.g. a first number of matched information clusters M1、M2、M3、M4、M5The similarity between the corresponding index vector and the information association vector is decreased. Assuming that the second number is 100, the execution bodies may be respectively selected from the information clusters M1、M2、M3、M4、M524, 22, 20, 18 and 16 pieces of information are selected.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of a method for generating a set of recall information according to an embodiment of the present disclosure. In the application scenario of fig. 3, a user 301 uses a terminal device 302 to browse abnormally commodities such as "picnic mat", "preservation box", "purified water" and the like on a shopping APP, and obtains a user browsing information sequence 303. The terminal device 302 sends the user browsing information sequence 303 to the background server 304. The background server 304 inputs the acquired user browsing information sequence 303 to a pre-trained information association model to generate a corresponding information association vector V t305. Then, according to the index vector V of the preset information cluster set 3061-V1000The background server 304 selects 3 matched clusters 307 from the cluster set 306. Then, the backend server 304 selects 100 pieces of information from the matched information cluster 307 to generate a recall information set 308. The recall information set may include commodity information such as "wet towel", "tent", "table game" and "sunglasses", for example. Optionally, the backend server 304 may also send the generated recall information set 308 to the terminal device 302 for the user 301 to browse.
At present, one of the prior arts usually adopts collaborative filtering alone or adopts traditional vectorization recall method alone, which results in difficulty in recalling information with high coverage or possible recall of a large amount of long-tail information. In the method provided by the embodiment of the disclosure, the alternative information is divided into the plurality of information clusters, and the information clusters are selected according to the matching among the vectors, so that the user interaction or commodity co-occurrence is not required to be taken as a retrieval premise, and the coverage rate of the recalled information is effectively improved. Moreover, the recall information is generated by performing secondary selection from the matched information clusters, so that a large amount of long-tail information is prevented from being recalled due to single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby promoting the application effect of the recall model on the whole.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a set of recall information is illustrated. Flow 400 of the method for generating a set of recall information comprises the steps of:
In this embodiment, the data attribute information may include history cooperation data information.
In this embodiment, for the first number of matched information clusters selected, an execution subject (for example, the server 105 shown in fig. 1) of the method for generating the recall information set may count the historical browsing times of the information in each information cluster. The historical browsing times can be represented by the occurrence times of the information in each information cluster.
And 405, selecting a second number of information from at least a plurality of sequences according to the historical browsing times of the information, and generating a recall information set matched with the browsing information sequence of the user.
In this embodiment, the execution main body may select a second number of pieces of information from at least a plurality of sequences according to the historical browsing times of the information, and generate a recall information set matching the user browsing information sequence. The second number may be any number preset. The second number may be a number according to a rule, for example, the browsing times exceed a preset browsing amount threshold, and the ratio of the browsing times to the total information amount exceeds a preset browsing ratio threshold.
As an example, for example, a total of 1000 pieces of information are included in 10 matched information clusters. The execution body may extract 100 pieces of information from the 1000 pieces of information in at least order of the historical browsing times of the information, or may extract 10 pieces of information from 10 information clusters in at least order of the historical browsing times of the information in the clusters, thereby generating a recall information set matching the browsing information sequence of the user.
As can be seen from fig. 4, a flow 400 of the method for generating a set of recall information in the present embodiment represents a step of selecting recall information from the matched information clusters according to the historical browsing times in at least a sequence. Therefore, the scheme described in the embodiment can provide a basis for secondary selection of the recall information by using the browsing data presented by the historical data, so that the possibility of recalling a large number of commodities is effectively reduced, and the effect of the recall algorithm is improved.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for generating a set of recall information, which corresponds to the method embodiment shown in fig. 2 or fig. 4, and which may be applied in various electronic devices in particular.
As shown in fig. 5, the apparatus 500 for generating a recall information set provided by the present embodiment includes an obtaining unit 501, a generating unit 502, a selecting unit 503, and a recall unit 504. The acquiring unit 501 is configured to acquire a user browsing information sequence, where the user browsing information includes identifiers of browsed information arranged according to user browsing time; a generating unit 502 configured to input the user browsing information sequence to a pre-trained information association model, and generate an information association vector corresponding to the user browsing information sequence; a selecting unit 503 configured to select a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set take a vector consistent with the information association vector as an index; and a recall unit 504 configured to select a second number of information from the first number of matched information clusters and generate a recall information set matched with the user browsing information sequence.
In the present embodiment, in the apparatus 500 for generating a set of recall information: the specific processing of the obtaining unit 501, the generating unit 502, the selecting unit 503 and the recalling unit 504 and the technical effects thereof can refer to the related descriptions of step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementation manners of this embodiment, the selecting unit 503 may include: an acquisition module (not shown), a selection module (not shown), and a determination module (not shown). The obtaining module may be configured to obtain an index vector corresponding to a centroid of each information cluster in the information cluster set. The selecting module may be configured to select the first number of index vectors according to similarity between the index vectors and the information association vectors. The determining module may be configured to determine the information clusters corresponding to the first number of selected index vectors as matching information clusters.
In some optional implementation manners of this embodiment, the information association model may be obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises browsed information identifiers arranged according to user browsing time, the user history browsing information sequence comprises a sample label and a corresponding sample sequence, the sample label comprises a non-first element in the user history browsing information sequence, and the sample sequence comprises a subsequence formed by elements positioned in front of the sample label in the user history browsing information sequence; and taking the sample sequence of the training sample as the input of the information correlation model, taking the sample label corresponding to the input sample sequence as the expected output of the information correlation model, and training to obtain the information correlation model.
In some optional implementation manners of this embodiment, the preset information cluster set may be obtained by: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting the user historical browsing information sequences in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; and clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as an information cluster set.
In some optional implementations of the present embodiment, the recall unit 504 may include: a statistic module (not shown in the figure), and a generation module (not shown in the figure). The statistical module may be configured to count the historical browsing times of the information in each information cluster for the selected first number of matched information clusters; and the generating module can be configured to select a second number of information from at least a plurality of sequences according to the historical browsing times of the information, and generate a recall information set matched with the browsing information sequence of the user.
According to the device provided by the above embodiment of the present disclosure, firstly, the information association vector corresponding to the user browsing information sequence is obtained through the generating unit 502, and the selecting unit 503 selects a matched information cluster from the candidate information composed of a plurality of information clusters according to matching between vectors, thereby effectively improving the coverage rate of the recalled information. Moreover, the recall unit 504 performs secondary selection from the matched information clusters to generate recall information, so that a large amount of long-tail information is avoided being recalled due to single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby promoting the application effect of the recall model on the whole.
Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the server; or may exist separately and not be assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a user browsing information sequence, wherein the user browsing information comprises browsed information identifiers arranged according to user browsing time; inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vectors from a preset information cluster set, wherein the information clusters in the information cluster set take the vectors with the same form as the information association vectors as indexes; and selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor comprises an acquisition unit, a generation unit, a selection unit and a recall unit. The names of these units do not in some cases constitute a limitation to the unit itself, and for example, the acquiring unit may also be described as a "unit that acquires a sequence of user browsing information, where the user browsing information includes an identification of browsed information arranged according to the user browsing time".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.
Claims (10)
1. A method for generating a set of recall information, comprising:
acquiring a user browsing information sequence, wherein the user browsing information comprises browsed information identifiers arranged according to user browsing time;
inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence;
selecting a first number of matched information clusters matched with the information association vectors from a preset information cluster set, wherein the information clusters in the information cluster set take vectors with the same form as the information association vectors as indexes;
and selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence.
2. The method of claim 1, wherein said selecting a first number of matched clusters of information from a predetermined set of clusters of information associated with said information comprises:
acquiring an index vector corresponding to the centroid of each information cluster in the information cluster set;
selecting a first number of index vectors according to the similarity of the index vectors and the information association vectors;
and determining the information clusters corresponding to the selected first number of index vectors as the matched information clusters.
3. The method of claim 1 or 2, wherein the information correlation model is trained by:
acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises browsed information identifiers arranged according to user browsing time, the user history browsing information sequence comprises a sample label and a corresponding sample sequence, the sample label comprises a non-first element in the user history browsing information sequence, and the sample sequence comprises a subsequence formed by elements positioned in front of the sample label in the user history browsing information sequence;
and taking a sample sequence of a training sample as the input of the information association model, taking a sample label corresponding to the input sample sequence as the expected output of the information association model, and training to obtain the information association model.
4. The method of claim 3, wherein the preset set of information clusters is obtained by:
acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time;
inputting the user historical browsing information sequence in the user historical browsing information sequence set into the pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set;
clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as the information cluster set.
5. The method of claim 4, wherein said selecting a second number of information from said first number of matched information clusters to generate a recall information set matching said user browsing information sequence comprises:
for the selected first number of matched information clusters, counting the historical browsing times of the information in each information cluster;
and selecting a second number of information from at least a plurality of sequences according to the historical browsing times of the information, and generating a recall information set matched with the user browsing information sequence.
6. An apparatus for generating a set of recall information, comprising:
the information acquisition unit is configured to acquire a user browsing information sequence, wherein the user browsing information comprises an identifier of browsed information arranged according to user browsing time;
the generating unit is configured to input the user browsing information sequence into a pre-trained information association model and generate an information association vector corresponding to the user browsing information sequence;
the selecting unit is configured to select a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set take a vector consistent with the information association vector as an index;
and the recalling unit is configured to select a second number of information from the first number of matched information clusters and generate a recalling information set matched with the user browsing information sequence.
7. The apparatus of claim 6, wherein the information correlation model is trained by:
acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises browsed information identifiers arranged according to user browsing time, the user history browsing information sequence comprises a sample label and a corresponding sample sequence, the sample label comprises a non-first element in the user history browsing information sequence, and the sample sequence comprises a subsequence formed by elements positioned in front of the sample label in the user history browsing information sequence;
and taking a sample sequence of a training sample as the input of the information association model, taking a sample label corresponding to the input sample sequence as the expected output of the information association model, and training to obtain the information association model.
8. The apparatus of claim 7, wherein the preset set of information clusters is obtained by:
acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time;
inputting the user historical browsing information sequence in the user historical browsing information sequence set into the pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set;
clustering the generated historical information association vector set, and determining the historical association information set corresponding to the clustered historical information association vector set as the information cluster set.
9. A server, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010434711.5A CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010434711.5A CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113704596A true CN113704596A (en) | 2021-11-26 |
CN113704596B CN113704596B (en) | 2024-08-20 |
Family
ID=78646040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010434711.5A Active CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704596B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880580A (en) * | 2022-06-15 | 2022-08-09 | 北京百度网讯科技有限公司 | Information recommendation method and device, electronic equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107577737A (en) * | 2017-08-25 | 2018-01-12 | 北京百度网讯科技有限公司 | Method and apparatus for pushed information |
WO2018112696A1 (en) * | 2016-12-19 | 2018-06-28 | 深圳大学 | Content pushing method and content pushing system |
WO2019000710A1 (en) * | 2017-06-27 | 2019-01-03 | 北京金山安全软件有限公司 | Page loading method, apparatus and electronic device |
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110008375A (en) * | 2019-03-22 | 2019-07-12 | 广州新视展投资咨询有限公司 | Video is recommended to recall method and apparatus |
CN110704739A (en) * | 2019-09-30 | 2020-01-17 | 汉海信息技术(上海)有限公司 | Resource recommendation method and device and computer storage medium |
-
2020
- 2020-05-21 CN CN202010434711.5A patent/CN113704596B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018112696A1 (en) * | 2016-12-19 | 2018-06-28 | 深圳大学 | Content pushing method and content pushing system |
WO2019000710A1 (en) * | 2017-06-27 | 2019-01-03 | 北京金山安全软件有限公司 | Page loading method, apparatus and electronic device |
CN107577737A (en) * | 2017-08-25 | 2018-01-12 | 北京百度网讯科技有限公司 | Method and apparatus for pushed information |
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110008375A (en) * | 2019-03-22 | 2019-07-12 | 广州新视展投资咨询有限公司 | Video is recommended to recall method and apparatus |
CN110704739A (en) * | 2019-09-30 | 2020-01-17 | 汉海信息技术(上海)有限公司 | Resource recommendation method and device and computer storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880580A (en) * | 2022-06-15 | 2022-08-09 | 北京百度网讯科技有限公司 | Information recommendation method and device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN113704596B (en) | 2024-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460514B (en) | Method and device for pushing information | |
CN109492772B (en) | Method and device for generating information | |
CN109471978B (en) | Electronic resource recommendation method and device | |
CN110059172B (en) | Method and device for recommending answers based on natural language understanding | |
CN110619078B (en) | Method and device for pushing information | |
CN112241327A (en) | Shared information processing method and device, storage medium and electronic equipment | |
CN111339406A (en) | Personalized recommendation method, device, equipment and storage medium | |
CN110209658B (en) | Data cleaning method and device | |
CN110909222A (en) | User portrait establishing method, device, medium and electronic equipment based on clustering | |
CN113781149B (en) | Information recommendation method and device, computer readable storage medium and electronic equipment | |
CN107977678A (en) | Method and apparatus for output information | |
CN112052297B (en) | Information generation method, apparatus, electronic device and computer readable medium | |
CN113807926A (en) | Recommendation information generation method and device, electronic equipment and computer readable medium | |
CN107968743A (en) | The method and apparatus of pushed information | |
CN111353103A (en) | Method and apparatus for determining user community information | |
WO2022001887A1 (en) | Method and apparatus for training item coding model | |
CN113704596B (en) | Method and apparatus for generating recall information sets | |
CN111382365A (en) | Method and apparatus for outputting information | |
CN111125544A (en) | User recommendation method and device | |
CN116186541A (en) | Training method and device for recommendation model | |
CN113516524B (en) | Method and device for pushing information | |
CN112348614B (en) | Method and device for pushing information | |
CN113313542B (en) | Method and device for pushing channel pages | |
CN116205686A (en) | Method, device, equipment and storage medium for recommending multimedia resources | |
CN111753111A (en) | Picture searching method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |