CN113704596B - Method and apparatus for generating recall information sets - Google Patents
Method and apparatus for generating recall information sets Download PDFInfo
- Publication number
- CN113704596B CN113704596B CN202010434711.5A CN202010434711A CN113704596B CN 113704596 B CN113704596 B CN 113704596B CN 202010434711 A CN202010434711 A CN 202010434711A CN 113704596 B CN113704596 B CN 113704596B
- Authority
- CN
- China
- Prior art keywords
- information
- user
- sequence
- browsing
- clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 120
- 238000012549 training Methods 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000015654 memory Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 235000021178 picnic Nutrition 0.000 description 1
- 239000008213 purified water Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present disclosure disclose methods and apparatus for generating a recall information set. One embodiment of the method comprises the following steps: acquiring a user browsing information sequence, wherein the user browsing information comprises the identification of browsed information arranged according to the user browsing time; inputting the user browsing information sequence into a pre-trained information association model to generate an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by vectors consistent with the information association vector form; and selecting a second number of information from the first number of matched information clusters, and generating a recall information set matched with the user browsing information sequence. The implementation method effectively improves the coverage rate of recall information.
Description
Technical Field
Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for generating a recall information set.
Background
With the development of big data technology and artificial intelligence technology, higher and higher requirements are put on recall algorithms of a recommendation system.
In the prior art, collaborative filtering or vectorized recall methods are generally adopted. The collaborative filtering algorithm based on the user or the recommended information often requires enough users with interaction histories for the same information or requires that the information has co-occurrence relations, so that the information recalled by the method often has insufficient information diversity and limited coverage rate because the information does not belong to full-scale retrieval. For the traditional vectorization recall method, a great amount of long-tail information can be recalled because similarity calculation is directly carried out on the user vector and the information vector in the full-quantity information base and recall information is returned according to the similarity.
Disclosure of Invention
Embodiments of the present disclosure propose methods and apparatus for generating a recall information set.
In a first aspect, embodiments of the present disclosure provide a method for generating a set of recall information, the method comprising: acquiring a user browsing information sequence, wherein the user browsing information comprises the identification of browsed information arranged according to the user browsing time; inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by vectors consistent with the information association vector form; and selecting a second number of information from the first number of matched information clusters, and generating a recall information set matched with the user browsing information sequence.
In some embodiments, selecting, from the predetermined set of clusters, a first number of clusters matching the first number of information association vectors includes: acquiring index vectors corresponding to centroids of all information clusters in the information cluster set; selecting a first number of index vectors according to the similarity between the index vectors and the information association vectors; and determining the information clusters corresponding to the selected first number of index vectors as matched information clusters.
In some embodiments, the information association model is obtained through training of the following steps: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises identifications of browsed information arranged according to user browsing time, the user history browsing information sequence comprises sample tags and corresponding sample sequences, the sample tags comprise non-initial elements in the user history browsing information sequence, and the sample sequences comprise subsequences formed by elements positioned before the sample tags in the user history browsing information sequence; and taking a sample sequence of the training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
In some embodiments, the preset information cluster set is obtained by the following steps: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting a user historical browsing information sequence in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as an information cluster set.
In some embodiments, selecting the second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence includes: counting the historical browsing times of the information in each information cluster for the selected first number of matched information clusters; and selecting a second number of information from at least more sequences according to the historical browsing times of the information, and generating a recall information set matched with the user browsing information sequence.
In a second aspect, embodiments of the present disclosure provide an apparatus for generating a set of recall information, the apparatus comprising: an acquisition unit configured to acquire a sequence of user browsing information including identifications of browsed information arranged in accordance with a user browsing time; the generation unit is configured to input a user browsing information sequence into the pre-trained information association model and generate an information association vector corresponding to the user browsing information sequence; a selecting unit configured to select a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by a vector consistent with the information association vector form; and the recall unit is configured to select a second number of information from the first number of matched information clusters and generate a recall information set matched with the user browsing information sequence.
In some embodiments, the selecting unit includes: the acquisition module is configured to acquire index vectors corresponding to centroids of the information clusters in the information cluster set; a selection module configured to select a first number of index vectors according to a similarity of the index vectors to the information-associated vector; and the determining module is configured to determine the information clusters corresponding to the selected first number of index vectors as matched information clusters.
In some embodiments, the information association model is obtained through training of the following steps: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises identifications of browsed information arranged according to user browsing time, the user history browsing information sequence comprises sample tags and corresponding sample sequences, the sample tags comprise non-initial elements in the user history browsing information sequence, and the sample sequences comprise subsequences formed by elements positioned before the sample tags in the user history browsing information sequence; and taking a sample sequence of the training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
In some embodiments, the preset information cluster set is obtained by the following steps: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting a user historical browsing information sequence in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as an information cluster set.
In some embodiments, the recall unit comprises: the statistics module is configured to count historical browsing times of information in each information cluster for the selected first number of matched information clusters; the generation module is configured to select a second number of information from at least more than one according to the historical browsing times of the information, and generate a recall information set matched with the user browsing information sequence.
In a third aspect, embodiments of the present disclosure provide a server comprising: one or more processors; a storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
According to the method and the device for generating the recall information set, the candidate information is divided into the information clusters, the information clusters are selected according to the matching among the vectors, user interaction or commodity co-occurrence is not required to be used as a search premise, and the coverage rate of the recall information is effectively improved. And recall information is generated by secondary selection from the matched information clusters, so that a large amount of long-tail information is prevented from being recalled due to single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby improving the application effect of the recall model as a whole.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:
FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;
FIG. 2 is a flow chart of one embodiment of a method for generating a set of recall information according to the present disclosure;
FIG. 3 is a schematic illustration of one application scenario of a method for generating a set of recall information according to an embodiment of the present disclosure;
FIG. 4 is a flow chart of yet another embodiment of a method for generating a set of recall information according to the present disclosure;
FIG. 5 is a schematic structural diagram of one embodiment of an apparatus for generating a recall information collection according to the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary architecture 100 to which the methods of the present disclosure for generating a set of recall information or apparatuses for generating a set of recall information may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The terminal devices 101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and information interaction, including but not limited to smartphones, tablet computers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for information displayed on the terminal devices 101, 102, 103. The background server can analyze and process the received user browsing information sequence, and according to the generated corresponding recall information set (such as information possibly interested by the user), the generated recall information set can be fed back to the terminal device.
It should be noted that, the server may be hardware, or may be software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be noted that, the method for generating the recall information set provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the method for generating the recall information set is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a method for generating a set of recall information according to the present disclosure is shown. The method for generating a recall information set includes the steps of:
step 201, a user browsing information sequence is acquired.
In this embodiment, the execution subject of the method for generating the recall information set (such as the server 105 shown in fig. 1) may acquire the user browsing information sequence through a wired connection or a wireless connection. The user browsing information may include an identification of browsed information arranged according to a user browsing time. The browsed information may include various information, such as information, merchandise information, etc., that the terminal presents to the user. The user browsing information sequence may include, for example, a sequence of item ids (item ids) of merchandise information that the user browses successively within a preset period of time (for example, 10 minutes). The identification of the commodity information may include various character strings for uniquely identifying the displayed commodity, which will not be described herein.
In this embodiment, the execution body may acquire the user browsing information sequence stored locally in advance, or may acquire the user browsing information sequence from an electronic device (for example, terminal devices 101, 102, 103 shown in fig. 1) connected to the execution body in communication.
It should be noted that "browsing" may also include dividing into finer dimensions. For example, it may refer to the browsing time being longer than a preset time threshold, and may further refer to, for example, clicking operations (e.g., "like", "comment", "join collection", etc.) performed on the browsed page.
Step 202, inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence.
In this embodiment, the execution subject may input the user browsing information sequence acquired in step 201 to a pre-trained information association model, and generate an information association vector corresponding to the user browsing information sequence. The information association model may be used to characterize a correspondence between the information association vector and the user browsing information sequence. The information association model may be various models which are trained by a machine learning method and can generate feature vectors, such as a recurrent neural network (Recurrent Neural Networks, RNN), a Long Short-Term Memory (LSTM), and the like.
In some optional implementations of this embodiment, the information association model may be obtained through training:
First, a training sample set is obtained.
In these implementations, an executing subject for training the information-bearing model described above may first obtain a set of training samples. The training samples in the training sample set may include a user history browsing information sequence. The user history browsing information may include an identification of browsed information arranged according to a user browsing time. The user history browsing information sequence may include a sample tag and a corresponding sample sequence. The sample tag may include a non-first element in the user history browsing information sequence. The sample sequence may include a sub-sequence of elements preceding the sample tag in the user history browsing information sequence.
In these implementations, as an example, the user clicks on pages introducing merchandise a, merchandise F, merchandise H, and merchandise S in sequence within 10 minutes. The training samples may be (commodity a, commodity F, commodity H, commodity S). The sample tag may be "commodity S". The sample sequence corresponding to the sample tag may be (commodity a, commodity F, commodity H). A large number of training samples may be collected by a method similar to that described above, thereby forming the training sample set described above.
And secondly, taking a sample sequence of the training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
In these implementations, the execution body for training the information association model may input the sample sequence of the training sample obtained in the first step to the initial information association model, to obtain the network output. The initial information association model may include various RNNs, such as LSTM, GRU (gated recurrent neural network ), among others. The execution body may then compare the network output with a sample tag corresponding to the input sample sequence. And adjusting network parameters of the initial information-associated network based on the comparison result, and stopping training when the training ending condition is met. Then, the execution subject may determine the initial information correlation model adjusted by the trained network parameters as the information correlation model.
Based on the optional implementation manner, the execution subject can improve the accuracy of the generated information association vector through training the information association model, and provides a good data basis for subsequent information recall.
Step 203, selecting a first number of matched information clusters matched with the information association vector from the preset information cluster sets.
In this embodiment, the executing body may select, in various manners, a first number of matched information clusters matching the information association vector acquired in the step 202 from the preset information cluster sets. Wherein, the information clusters in the information cluster set are generally indexed by a vector consistent with the information association vector form. In this embodiment, the preset information cluster set may include a plurality of information clusters. Wherein each of the clusters of information generally corresponds to a respective index vector. The index vector is generally identical to the form of the information-bearing vector, e.g., the same dimension.
In this embodiment, as an example, the executing body may select, by using a K-Nearest Neighbor (KNN) classification algorithm, a first number of matched information clusters matched with the information association vector from preset information cluster sets. As yet another example, the executing body may compare the similarity between the information association vector obtained in step 202 and the index vector corresponding to the information cluster in the preset information cluster set. Then, the executing body may select the first number of matched information clusters in order of high-to-low similarity. Wherein, the first number may be any number set in advance. The first number may also be a rule-dependent number, such as the number of clusters of information for which the similarity exceeds a similarity threshold.
In some optional implementations of this embodiment, the executing body may further select a first number of matched information clusters matched with the information association vector from a preset information cluster set by:
the first step, index vectors corresponding to the centroids of all the information clusters in the information cluster set are obtained.
In these implementations, the executing body may first obtain an index vector corresponding to a centroid of each information cluster in the information cluster set. As an example, each information in the set of information clusters may each correspond to a feature vector. The execution body may determine a feature vector corresponding to a centroid of each information cluster as an index vector corresponding to each information cluster.
And secondly, selecting a first number of index vectors according to the similarity between the index vectors and the information association vectors.
In these implementations, the executing entity may select the first number of index vectors according to the similarity between the index vectors obtained in the first step and the information association vector in a manner similar to the foregoing step 203.
And thirdly, determining the information clusters corresponding to the selected first number of index vectors as matched information clusters.
Based on the optional implementation manner, the execution body can use the index vector corresponding to the centroid of each information cluster as the retrieval basis of the matched information cluster, so that the selection efficiency and accuracy of the matched information cluster are improved.
In some optional implementations of the present embodiment, based on the information association model obtained through the training in the foregoing training step, the foregoing preset information cluster may be obtained by:
First, a user history browsing information sequence set is obtained.
In these implementations, the executing entity that constructs the set of information clusters may obtain the set of user history browsing information sequences in various ways. The user history browsing information may include an identification of history browsed information arranged according to a user browsing time.
It should be noted that, the user history browsing information may directly use the training sample set. The user history browsing information may also be different from the training sample set.
And secondly, inputting the user history browsing information sequences in the user history browsing information sequence set into a pre-trained information association model to generate a history information association vector set and a history association information set corresponding to the user history browsing information sequence set.
In these implementations, the execution body for constructing the information cluster set may input the user history browsing information sequence in the user history browsing information sequence set acquired in the first step into the trained information association model, so as to generate a history information association vector set and a history association information set corresponding to the user history browsing information sequence set. The history related information in the history related information set can be obtained from an output layer of the information related model. The form of the history-related information is generally consistent with the sample tag. The historical information correlation vector in the historical information correlation vector set can be obtained from a hidden layer of the information correlation model. The history information association vector may correspond to the history information.
And thirdly, clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as an information cluster set.
In these implementations, the execution subject may first cluster the set of history information association vectors generated in the second step in various ways. The clustering method may include, for example, kmeans (k-means) algorithm. Then, the execution subject may determine the history associated information set corresponding to the clustered history information associated vector set as an information cluster set.
Based on the alternative implementation manner, the executing body may construct an information cluster set according to a historical information association vector set obtained by using the information association model. Therefore, the information in the information cluster can be automatically classified through the features learned by the model fused with the collaborative filtering thought based on the user-based, and the method has higher accuracy and flexibility compared with manual classification.
Step 204, selecting a second number of information from the first number of matched information clusters, and generating a recall information set matched with the user browsing information sequence.
In this embodiment, the executing entity may select the second number of information from the first number of matched information clusters selected in step 203 in various manners, so as to generate a recall information set matched with the user browsing information sequence. As an example, the execution body may select the second number of information from the first number of matched information clusters selected in the step 203 in order of high-to-low similarity with the information association vector generated in the step 202, so as to form the recall information set matched with the user browsing information sequence. Wherein the second number may be any number set in advance. The second number may also be a rule-dependent number, such as the number of information for which the similarity exceeds a similarity threshold. As yet another example, the execution body may further select, according to a number relationship in which the index vector corresponding to the information cluster matches the similarity ranking of the information association vector, a total of a second number of pieces of information corresponding to the respective information clusters, that is, more pieces of information corresponding to the index vector with higher similarity. For example, the similarity between the index vector corresponding to the first number of matched information clusters M 1、M2、M3、M4、M5 and the information-association vector decreases. Assuming that the second number is 100, the executing body may select 24, 22, 20, 18, 16 pieces of information from the information cluster M 1、M2、M3、M4、M5, respectively.
With continued reference to fig. 3, fig. 3 is a schematic illustration of an application scenario of a method for generating a recall information set according to an embodiment of the present disclosure. In the application scenario of fig. 3, a user 301 uses a terminal device 302 to abnormally browse commodities such as "picnic pad", "fresh-keeping box", "purified water" and the like on a shopping APP, and obtains a user browsing information sequence 303. The terminal device 302 sends the user browsing information sequence 303 to the background server 304. The background server 304 inputs the acquired user browsing information sequence 303 into a pre-trained information association model, and generates a corresponding information association vector V t. Then, according to the index vector V 1-V1000 of the preset information cluster set 306, the background server 304 selects 3 matched information clusters 307 from the information cluster set 306. The background server 304 then selects 100 pieces of information from the matched information clusters 307 to generate a recall information set 308. The recall information set may include, for example, commodity information such as "wet towel", "tent", "table game", "sunglasses". Optionally, the background server 304 may also send the generated recall information set 308 to the terminal device 302 for viewing by the user 301.
Currently, one of the prior art is often to use collaborative filtering alone or conventional vectorized recall methods alone, resulting in difficulty in higher coverage of the recalled information or the possibility of recalling large amounts of long tail information. According to the method provided by the embodiment of the disclosure, the candidate information is divided into the information clusters, the information clusters are selected according to the matching among the vectors, and the user interaction or commodity co-occurrence is not required to be used as a search premise, so that the coverage rate of recall information is effectively improved. And recall information is generated by secondary selection from the matched information clusters, so that a large amount of long-tail information is prevented from being recalled due to single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby improving the application effect of the recall model as a whole.
With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating a set of recall information is shown. The flow 400 of the method for generating a recall information set includes the steps of:
Step 401, a user browsing information sequence is acquired.
In this embodiment, the data attribute information may include historical cooperation data information.
Step 402, inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence.
Step 403, selecting a first number of matched information clusters matched with the information association vector from the preset information cluster sets.
The steps 401, 402, and 403 are identical to the steps 201, 202, 203 and optional implementations of the foregoing embodiments, and the descriptions of the steps 201, 202, 203 and optional implementations of the steps are also applicable to the steps 401, 402, and 403, which are not repeated herein.
Step 404, counting the historical browsing times of the information in each information cluster for the selected first number of matched information clusters.
In this embodiment, for the selected first number of matched information clusters, the execution subject (e.g., the server 105 shown in fig. 1) of the method for generating the recall information set may count the historical browsing times of the information in each of the information clusters. The history browsing times may be represented by the occurrence times of the information in each of the information clusters.
And step 405, selecting a second number of information from at least more sequences according to the historical browsing times of the information, and generating a recall information set matched with the user browsing information sequence.
In this embodiment, the executing body may select the second number of information from the at least more order according to the historical browsing times of the information, and generate the recall information set matched with the user browsing information sequence. Wherein the second number may be any number set in advance. The second number may also be a number according to a rule, for example, the number of browsing times exceeds a preset browsing amount threshold, and the ratio of the number of browsing times to the total information amount exceeds a preset browsing ratio threshold.
As an example, for example, a total of 1000 pieces of information are included in 10 matching information clusters. The execution body may select 100 pieces of information from the 1000 pieces of information according to at least more sequences of historical browsing times of the information, or may select 10 pieces of information from 10 clusters of information according to at least more sequences of historical browsing times of information in the clusters, so as to generate a recall information set matched with a user browsing information sequence.
As can be seen from fig. 4, a flow 400 of the method for generating a set of recall information in this embodiment embodies the steps of selecting recall information from a plurality of matching clusters of information in at least an order according to a historical number of browses. Therefore, the scheme described in the embodiment can provide a basis for secondary selection of recall information by utilizing browsing data presented by historical data, so that the possibility of recalling a large number of commodities is effectively reduced, and the effect of a recall algorithm is improved.
With further reference to FIG. 5, as an implementation of the method illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for generating a set of recall information, which corresponds to the method embodiment illustrated in FIG. 2 or FIG. 4, and which is particularly applicable in a variety of electronic devices.
As shown in fig. 5, the apparatus 500 for generating a recall information set provided in this embodiment includes an obtaining unit 501, a generating unit 502, a selecting unit 503, and a recall unit 504. Wherein, the obtaining unit 501 is configured to obtain a sequence of user browsing information, where the user browsing information includes the identifications of browsed information arranged according to the user browsing time; a generating unit 502 configured to input a user browsing information sequence to a pre-trained information association model, and generate an information association vector corresponding to the user browsing information sequence; a selecting unit 503 configured to select a first number of matched information clusters matched with the information association vector from a preset information cluster set, where the information clusters in the information cluster set are indexed by a vector consistent with the information association vector form; a recall unit 504 configured to select a second number of information from the first number of matched clusters of information, generating a set of recall information matching the sequence of user browsing information.
In this embodiment, in the apparatus 500 for generating a recall information set: the specific processing of the obtaining unit 501, the generating unit 502, the selecting unit 503 and the recall unit 504 and the technical effects thereof may refer to the descriptions related to step 201, step 202, step 203 and step 204 in the corresponding embodiment of fig. 2, and are not repeated here.
In some optional implementations of this embodiment, the selecting unit 503 may include: the device comprises an acquisition module (not shown in the figure), a selection module (not shown in the figure) and a determination module (not shown in the figure). The obtaining module may be configured to obtain an index vector corresponding to a centroid of each information cluster in the information cluster set. The selecting module may be configured to select the first number of index vectors according to a similarity of the index vectors to the information-related vectors. The determining module may be configured to determine the information cluster corresponding to the selected first number of index vectors as a matched information cluster.
In some optional implementations of this embodiment, the information association model may be obtained through training: acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises identifications of browsed information arranged according to user browsing time, the user history browsing information sequence comprises sample tags and corresponding sample sequences, the sample tags comprise non-initial elements in the user history browsing information sequence, and the sample sequences comprise subsequences formed by elements positioned before the sample tags in the user history browsing information sequence; and taking a sample sequence of the training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
In some optional implementations of this embodiment, the foregoing preset information cluster set may be obtained by: acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time; inputting a user historical browsing information sequence in the user historical browsing information sequence set into a pre-trained information association model, and generating a historical information association vector set and a historical association information set corresponding to the user historical browsing information sequence set; clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as an information cluster set.
In some optional implementations of this embodiment, the recall unit 504 may include: a statistics module (not shown in the figure), a generation module (not shown in the figure). The statistics module may be configured to count historical browsing times of information in each information cluster for the selected first number of matched information clusters; the generation module may be configured to select a second number of information from the at least more order according to the historical number of browses of information, generating a recall information set that matches the sequence of user browses information.
According to the device provided by the embodiment of the disclosure, the generating unit 502 is used for obtaining the information association vector corresponding to the user browsing information sequence, and the selecting unit 503 is used for selecting the matched information cluster from the candidate information consisting of a plurality of information clusters according to the matching among the vectors, so that the coverage rate of recall information is effectively improved. In addition, recall information is generated by performing secondary selection from the matched information clusters through the recall unit 504, so that a large amount of long-tail information is prevented from being recalled due to a single similarity dimension, and the accuracy and the conversion rate of information recommendation are improved. Thereby improving the application effect of the recall model as a whole.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., server in fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, etc., and a fixed terminal such as a digital TV, a desktop computer, etc. The server illustrated in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure in any way.
As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 6 may represent one device or a plurality of devices as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing means 601.
It should be noted that, the computer readable medium according to the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (Radio Frequency), and the like, or any suitable combination thereof.
The computer readable medium may be contained in the server; or may exist alone without being assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to: acquiring a user browsing information sequence, wherein the user browsing information comprises the identification of browsed information arranged according to the user browsing time; inputting the user browsing information sequence into a pre-trained information association model, and generating an information association vector corresponding to the user browsing information sequence; selecting a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by vectors consistent with the information association vector form; and selecting a second number of information from the first number of matched information clusters, and generating a recall information set matched with the user browsing information sequence.
Computer program code for carrying out operations of embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor comprises an acquisition unit, a generation unit, a selection unit and a recall unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires a sequence of user browsing information including an identification of browsed information arranged in accordance with the user browsing time".
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.
Claims (10)
1. A method for generating a set of recall information, comprising:
Acquiring a user browsing information sequence, wherein the user browsing information comprises identifiers of browsed information arranged according to user browsing time, and the user browsing information sequence comprises an identifier sequence of browsed information of a user in a preset time period;
Inputting the user browsing information sequence into a pre-trained information association model to generate an information association vector corresponding to the user browsing information sequence;
Selecting a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by vectors consistent with the information association vector form;
And selecting a second number of information from the first number of matched information clusters to generate a recall information set matched with the user browsing information sequence.
2. The method of claim 1, wherein the selecting, from a preset set of clusters, a first number of clusters matching the information association vector comprises:
Acquiring index vectors corresponding to centroids of all information clusters in the information cluster set;
Selecting a first number of index vectors according to the similarity between the index vectors and the information association vector;
And determining the information clusters corresponding to the selected first number of index vectors as the matched information clusters.
3. The method according to claim 1 or 2, wherein the information-bearing model is trained by:
Acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises identifications of browsed information arranged according to user browsing time, the user history browsing information sequence comprises sample tags and corresponding sample sequences, the sample tags comprise non-initial elements in the user history browsing information sequence, and the sample sequences comprise subsequences formed by elements positioned before the sample tags in the user history browsing information sequence;
and taking a sample sequence of a training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
4. A method according to claim 3, wherein the set of preset clusters of information is obtained by:
acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time;
Inputting a user history browsing information sequence in the user history browsing information sequence set into the pre-trained information association model to generate a history information association vector set and a history association information set corresponding to the user history browsing information sequence set;
clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as the information cluster set.
5. The method of claim 4, wherein the selecting a second number of information from the first number of matching clusters of information to generate a set of recall information matching the sequence of user-browsed information comprises:
counting the historical browsing times of the information in each information cluster for the selected first number of matched information clusters;
and selecting a second number of information from at least more sequences according to the historical browsing times of the information, and generating a recall information set matched with the user browsing information sequence.
6. An apparatus for generating a set of recall information, comprising:
An acquisition unit configured to acquire a user browsing information sequence, wherein the user browsing information comprises identifications of browsed information arranged according to user browsing time, and the user browsing information sequence comprises an identification sequence of browsed information of a user in a preset time period;
a generation unit configured to input the user browsing information sequence to a pre-trained information association model, and generate an information association vector corresponding to the user browsing information sequence;
A selecting unit configured to select a first number of matched information clusters matched with the information association vector from a preset information cluster set, wherein the information clusters in the information cluster set are indexed by a vector consistent with the information association vector form;
and the recall unit is configured to select a second number of information from the first number of matched information clusters and generate a recall information set matched with the user browsing information sequence.
7. The apparatus of claim 6, wherein the information-bearing model is trained by:
Acquiring a training sample set, wherein the training sample comprises a user history browsing information sequence, the user history browsing information comprises identifications of browsed information arranged according to user browsing time, the user history browsing information sequence comprises sample tags and corresponding sample sequences, the sample tags comprise non-initial elements in the user history browsing information sequence, and the sample sequences comprise subsequences formed by elements positioned before the sample tags in the user history browsing information sequence;
and taking a sample sequence of a training sample as input of the information correlation model, taking a sample label corresponding to the input sample sequence as expected output of the information correlation model, and training to obtain the information correlation model.
8. The apparatus of claim 7, wherein the set of preset clusters of information is obtained by:
acquiring a user historical browsing information sequence set, wherein the user historical browsing information comprises identifiers of historical browsed information arranged according to user browsing time;
Inputting a user history browsing information sequence in the user history browsing information sequence set into the pre-trained information association model to generate a history information association vector set and a history association information set corresponding to the user history browsing information sequence set;
clustering the generated historical information association vector sets, and determining the historical association information set corresponding to the clustered historical information association vector sets as the information cluster set.
9. A server, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
10. A computer readable medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010434711.5A CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010434711.5A CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113704596A CN113704596A (en) | 2021-11-26 |
CN113704596B true CN113704596B (en) | 2024-08-20 |
Family
ID=78646040
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010434711.5A Active CN113704596B (en) | 2020-05-21 | 2020-05-21 | Method and apparatus for generating recall information sets |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704596B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114880580A (en) * | 2022-06-15 | 2022-08-09 | 北京百度网讯科技有限公司 | Information recommendation method and device, electronic equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110704739A (en) * | 2019-09-30 | 2020-01-17 | 汉海信息技术(上海)有限公司 | Resource recommendation method and device and computer storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018112696A1 (en) * | 2016-12-19 | 2018-06-28 | 深圳大学 | Content pushing method and content pushing system |
CN107220094B (en) * | 2017-06-27 | 2019-06-28 | 北京金山安全软件有限公司 | Page loading method and device and electronic equipment |
CN107577737A (en) * | 2017-08-25 | 2018-01-12 | 北京百度网讯科技有限公司 | Method and apparatus for pushed information |
CN110008375A (en) * | 2019-03-22 | 2019-07-12 | 广州新视展投资咨询有限公司 | Video is recommended to recall method and apparatus |
-
2020
- 2020-05-21 CN CN202010434711.5A patent/CN113704596B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460519A (en) * | 2018-12-28 | 2019-03-12 | 上海晶赞融宣科技有限公司 | Browse object recommendation method and device, storage medium, server |
CN110704739A (en) * | 2019-09-30 | 2020-01-17 | 汉海信息技术(上海)有限公司 | Resource recommendation method and device and computer storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113704596A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109460514B (en) | Method and device for pushing information | |
CN109471978B (en) | Electronic resource recommendation method and device | |
CN110020162B (en) | User identification method and device | |
CN110059172B (en) | Method and device for recommending answers based on natural language understanding | |
CN111339406A (en) | Personalized recommendation method, device, equipment and storage medium | |
CN109933217A (en) | Method and apparatus for pushing sentence | |
CN112650841A (en) | Information processing method and device and electronic equipment | |
CN113781149B (en) | Information recommendation method and device, computer readable storage medium and electronic equipment | |
CN112052297B (en) | Information generation method, apparatus, electronic device and computer readable medium | |
CN111104599A (en) | Method and apparatus for outputting information | |
CN113807926A (en) | Recommendation information generation method and device, electronic equipment and computer readable medium | |
CN113378067B (en) | Message recommendation method, device and medium based on user mining | |
WO2022001887A1 (en) | Method and apparatus for training item coding model | |
CN111353103A (en) | Method and apparatus for determining user community information | |
CN112801053B (en) | Video data processing method and device | |
CN113033707B (en) | Video classification method and device, readable medium and electronic equipment | |
CN113704596B (en) | Method and apparatus for generating recall information sets | |
CN108509442B (en) | Search method and apparatus, server, and computer-readable storage medium | |
CN113450172B (en) | Commodity recommendation method and device | |
CN116955817A (en) | Content recommendation method, device, electronic equipment and storage medium | |
CN111382365A (en) | Method and apparatus for outputting information | |
CN112148865A (en) | Information pushing method and device | |
CN113516524B (en) | Method and device for pushing information | |
CN113743973B (en) | Method and device for analyzing market hotspot trend | |
CN116205686A (en) | Method, device, equipment and storage medium for recommending multimedia resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |