CN113111268A - Training method of user feature extraction model, content recommendation method and device - Google Patents

Training method of user feature extraction model, content recommendation method and device Download PDF

Info

Publication number
CN113111268A
CN113111268A CN202110487930.4A CN202110487930A CN113111268A CN 113111268 A CN113111268 A CN 113111268A CN 202110487930 A CN202110487930 A CN 202110487930A CN 113111268 A CN113111268 A CN 113111268A
Authority
CN
China
Prior art keywords
user
content
determining
identifier
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110487930.4A
Other languages
Chinese (zh)
Other versions
CN113111268B (en
Inventor
应文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110487930.4A priority Critical patent/CN113111268B/en
Publication of CN113111268A publication Critical patent/CN113111268A/en
Application granted granted Critical
Publication of CN113111268B publication Critical patent/CN113111268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a training method of a user feature extraction model, a content recommendation method, a content recommendation device, equipment, a medium and a product, and relates to the fields of deep learning, intelligent recommendation and the like. The training method of the user feature extraction model comprises the following steps: for each historical content in a plurality of historical contents, acquiring a user identifier and a content identifier for each historical content; determining a plurality of data sequences based on the association relationship between the user identifications and the content identifications, wherein the association relationship indicates that the user identifications and the content identifications aiming at the same historical content are associated, each data sequence comprises at least two user identifications, and any two adjacent user identifications in the at least two user identifications are associated with each other based on the same content identification; and training the user characteristic extraction model by using a plurality of data sequences.

Description

Training method of user feature extraction model, content recommendation method and device
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, particularly to the field of deep learning, intelligent recommendation, and the like, and more particularly, to a training method for a user feature extraction model, a content recommendation method, an apparatus, an electronic device, a medium, and a program product.
Background
There are various users who provide contents on a network, and related technologies may generally recommend contents provided by users to other users interested in the contents, and the other users interested in the contents generally pay attention to the users who provide the contents. For example, the user who provides the content is user a, and when user B has paid attention to user a, the content provided by user a may be recommended to user B. However, when the number of the users B who are interested in the user a is small, the recommendation effect of the content provided by the user a is poor, so that the content recommended by the user a is difficult to get the attention of more users.
Disclosure of Invention
The present disclosure provides a training method of a user feature extraction model, a content recommendation method, an apparatus, an electronic device, a storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a training method of a user feature extraction model, including: for each historical content in a plurality of historical contents, acquiring a user identifier and a content identifier for each historical content; determining a plurality of data sequences based on the association relationship between the user identifications and the content identifications, wherein the association relationship indicates that the user identifications and the content identifications for the same historical content are associated, each data sequence comprises at least two user identifications, and any two adjacent user identifications of the at least two user identifications are associated with each other based on the same content identification; and training the user characteristic extraction model by using the plurality of data sequences.
According to another aspect of the present disclosure, there is provided a content recommendation method including: determining a first user associated with a target user, and determining a target second user from at least one second user based on the user characteristics of the first user, wherein the similarity between the user characteristics of the target second user and the user characteristics of the first user meets a preset similarity condition, and the user characteristics of the first user and the user characteristics of the at least one second user are obtained by using the user characteristic extraction model; recommending the content to be recommended aiming at the target second user to the target user.
According to another aspect of the present disclosure, there is provided a training apparatus for a user feature extraction model, including: the device comprises a first obtaining module, a first determining module and a training module. The first obtaining module is used for obtaining a user identifier and a content identifier of each historical content aiming at each historical content in a plurality of historical contents. The first determining module is used for determining a plurality of data sequences based on the association relationship between the user identifications and the content identifications, wherein the association relationship indicates that the user identifications and the content identifications aiming at the same historical content are associated, each data sequence comprises at least two user identifications, and any two adjacent user identifications of the at least two user identifications are associated with each other based on the same content identification. And the training module is used for training the user characteristic extraction model by utilizing the plurality of data sequences.
According to another aspect of the present disclosure, there is provided a content recommendation apparatus including: a sixth determination module, a seventh determination module, and a recommendation module. The sixth determining module is used for determining the first user associated with the target user. A seventh determining module, configured to determine a target second user from at least one second user based on the user characteristics of the first user, where a similarity between the user characteristics of the target second user and the user characteristics of the first user satisfies a preset similarity condition, and the user characteristics of the first user and the user characteristics of the at least one second user are obtained by using the user characteristic extraction model as described above. And the recommending module is used for recommending the content to be recommended aiming at the target second user to the target user.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a user feature extraction model as described above.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the content recommendation method as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the training method of the user feature extraction model as described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the content recommendation method as described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method of training a user feature extraction model as described above.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a content recommendation method as described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an application scenario of a training method and a content recommendation method of a user feature extraction model according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of training a user feature extraction model according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a schematic diagram of building a relationship graph according to an embodiment of the present disclosure;
FIG. 4 schematically shows a diagram of constructing an incidence relation according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a schematic diagram of determining a data sequence based on an association relationship according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a diagram of obtaining historical content, according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a diagram of obtaining historical content, according to another embodiment of the present disclosure;
FIG. 8 schematically illustrates a diagram of obtaining historical content, according to another embodiment of the present disclosure;
FIG. 9 schematically shows a flow diagram of content recommendation according to an embodiment of the present disclosure;
FIG. 10 schematically illustrates a block diagram of a training apparatus for a user feature extraction model according to an embodiment of the present disclosure;
FIG. 11 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure; and
FIG. 12 is a block diagram of an electronic device for implementing a training method of a user feature extraction model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a training method of a user feature extraction model, which comprises the following steps: for each of the plurality of history contents, a user identification and a content identification for each history content are acquired. Then, a plurality of data sequences are determined based on the association relationship between the user identifications and the content identifications, wherein the association relationship indicates that the user identifications and the content identifications for the same historical content are associated, each data sequence comprises at least two user identifications, and any two adjacent user identifications of the at least two user identifications are associated with each other based on the same content identification. Next, a user feature extraction model is trained using the plurality of data sequences.
An embodiment of the present disclosure further provides a content recommendation method, including: a first user associated with the target user is determined. And then, determining a target second user from at least one second user based on the user characteristics of the first user, wherein the similarity between the user characteristics of the target second user and the user characteristics of the first user meets a preset similarity condition, and the user characteristics of the first user and the user characteristics of the at least one second user are obtained by using a user characteristic extraction model. And then recommending the content to be recommended for the target second user to the target user.
Fig. 1 schematically illustrates an application scenario of a training method and a content recommendation method of a user feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 of an embodiment of the present disclosure includes, for example, a target user 111, first users 121, 122, second users 131, 132, 133.
In the embodiment of the present disclosure, the first users 121, 122 are, for example, users providing related information, including but not limited to articles, pictures, documents, and products. The second users 131, 132, 133 are, for example, users providing content including, but not limited to, articles, pictures, husbands, products. In one example, the first user is, for example, an information provider in a network and the second user is, for example, an in-dwelling merchant in the network.
For the content to be recommended by the second user 131, 132, 133, the content to be recommended may be generally recommended to a user who is interested in the content to be recommended, for example, the content to be recommended may be recommended to other users who focus on the second user 131, 132, 133. However, when the number of other users is small, the content recommendation effect is poor, so that the content is difficult to be focused by more users.
Therefore, the embodiment of the present disclosure may determine that other users who do not pay attention to the second users 131, 132, 133 but are interested in the content provided by the second users 131, 132, 133 as the target user 111, so as to recommend the content to be recommended for the second users to the target user 111, thereby improving the content recommendation effect, so that the content gets more attention of the users.
Exemplarily, for example, a first user 121, 122 associated with the target user 111 may be determined, the target user 111 for example having focused on the first user 121, 122. Then, based on the similarity between the user characteristics of each first user and the user characteristics of each second user, a target second user similar to the first user is determined from the plurality of second users, and the user characteristics of the target second user are higher in similarity with the user characteristics of the first user. For example, the user characteristics of second user 131 are similar to the user characteristics of first user 121, and the user characteristics of second user 133 are similar to the user characteristics of first user 122, and second user 131 and second user 133 may be targeted second users.
Since the user characteristics of the target second user are similar to the user characteristics of the first user, it indicates that the related information provided by the first user is similar to the content provided by the target second user. When the target user 111 focuses on the first user, it indicates that the target user 111 is more interested in the related information provided by the first user, and thus it can be inferred that the target user 111 is more interested in the content provided by the target second user with a higher probability. Therefore, after the target second user is determined, the content to be recommended for the target second user can be recommended to the target user 111, so that the content recommendation effect of the second user is improved.
Illustratively, the user characteristics of each first user and the user characteristics of each second user are obtained, for example, using the user characteristic extraction model 140.
The embodiment of the present disclosure provides a training method and a content recommendation method for a user feature extraction model, and the following describes the training method and the content recommendation method for the user feature extraction model according to an exemplary embodiment of the present disclosure with reference to fig. 2 to 9 in combination with the application scenario of fig. 1. The first user and the second user in fig. 1 may be, for example, the users mentioned in fig. 1 to 8, and the related information provided by the first user and the content provided by the second user may be, for example, history content submitted below.
FIG. 2 schematically shows a flow chart of a training method of a user feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 2, the training method 200 of the user feature extraction model according to the embodiment of the present disclosure may include operations S210 to S230, for example.
In operation S210, for each of a plurality of history contents, a user identification and a content identification for each of the history contents are acquired.
In operation S220, a plurality of data sequences are determined based on the association relationship between the user identifier and the content identifier.
In operation S230, a user feature extraction model is trained using a plurality of data sequences.
Illustratively, the association relationship indicates that the user identifiers and the content identifiers for the same historical content are associated, each data sequence comprises at least two user identifiers, and any two adjacent user identifiers in the at least two user identifiers are associated with each other based on the same content identifier.
Illustratively, taking the example of multiple users providing multiple historical contents, each user may provide one or more historical contents. The following description will be given by taking user 1, user 2, and user 3 as examples. For the history content provided by the user 1, the user identifier of the history content is "user 1", for example, and the content identifiers are "decoration" and "home appliance", for example. For the history content provided by the user 2, the user identification of the history content is "user 2", for example, and the content identification is "decoration" and "maintenance", for example. For the history content provided by the user 3, the user identification of the history content is "user 3", for example, and the content identification is "maintenance", for example.
The user identification "user 1" and the content identification "decoration" are, for example, for the same history content, and therefore, "user 1" and "decoration" are associated with each other. The user identification "user 1" and the content identification "home appliance" are for example directed to the same history content, and therefore, the "user 1" and the "home appliance" are associated with each other. Similarly, "user 2" and "fitment" are associated with each other, and "user 2" and "service" are associated with each other. "user 3" and "service" are associated with each other.
The description is given taking the determination of a data sequence as an example. Based on the association relationship between each user identification and each content identification, a data sequence is determined, for example, user 1, user 2, user 3. The data sequence includes a plurality of user identifiers, and the plurality of user identifiers have an arrangement order, for example, user 1, user 2, and user 3.
For any two adjacent user identifications in the data sequence, the two adjacent user identifications are associated with each other based on the same content identification. For example, two adjacent user identifications are "user 1" and "user 2", and the "user 1" and "user 2" are associated based on the content identification "decoration". Two adjacent user identifications are "user 2" and "user 3", and the "user 2" and "user 3" are associated based on the content identification "service".
After acquiring the plurality of data sequences, a user feature extraction model may be trained based on the data sequences. Because the incidence relation of each user is represented in the data sequence, after the user feature extraction model is obtained based on the data sequence training, when the user feature is extracted by using the user feature extraction model, if the incidence relation exists among a plurality of users, the extracted user features of the plurality of users are relatively similar. In an example, the user features may include a feature vector of the user.
According to the embodiment of the disclosure, a data sequence is determined based on the association relationship between the user identifier and the content identifier in the historical content, and then the data sequence is used for training a user feature extraction model. Since the data sequence represents the incidence relation between the users, the user feature extraction model is trained based on the data sequence to obtain the user features. When the user features are extracted by using the user feature extraction model, the user features of a plurality of users with the incidence relation are similar, and the feature extraction accuracy of the user feature extraction model is improved.
FIG. 3 schematically shows a schematic diagram of building a relationship graph according to an embodiment of the disclosure.
As in fig. 3, a relationship graph for each user is constructed based on the historical content of each user. For example, the history content for the user 1 includes the history content B1History content B2History content B3History content B1History content B2History content B3For example, "user 1" for each historical content. Content identification for historical content, e.g. a tag for content, e.g. for historical content B1Is identified as "tag C1", for historical content B2Is identified as "tag C2", for historical content B3Is identified as "tag C1". Based on the user identifier, the history content, and the content identifier of the user 1, a relationship graph 300A for the user 1 is constructed. Similarly, a relationship graph 300B for user 2 may be constructed, and a relationship graph 300C for user 3 may be constructed.
FIG. 4 schematically shows a schematic diagram of constructing an association relationship according to an embodiment of the present disclosure.
As shown in fig. 4, an association relationship 400 between a plurality of user identifications and a plurality of content identifications is constructed based on a relationship diagram of each user.
For example, user identification "user 1" and content identification "tag C1"based on historical content B1(or historical content B)3) Association, user identification "user 1" and content identification "tag C2"based on historical content B2And (6) associating.
For example, user identification "user 2" and content identification "tag C1"based on historical content B4Association, user identification "user 2" and content identification "tag C3"based on historical content B4And (6) associating.
For example, user identification "user 3" and content identification "tag C3"based on historical content B5And (6) associating.
The association relationship 400 includes, for example, a bipartite graph (or bipartite graph), and the association relationship 400 includes a user identifier, a content identifier, and an association relationship between the user identifier and the content identifier.
Fig. 5 schematically shows a schematic diagram of determining a data sequence based on an association relationship according to an embodiment of the present disclosure.
As shown in fig. 5, a plurality of data sets 510, 520, 530, 540 are determined in turn based on the association relationship 500. Each data set includes a user identification and a content identification associated with the user identification. For example, user identification "user 1" and content identification "tag C in data set 5101"associated, i.e." user 1 "and" tag C1"based on historical content B1(or historical content B)3) And (6) associating.
Then, a plurality of target data groups are determined from the plurality of data groups 510, 520, 530, and 540, and user identifiers are sequentially extracted from the plurality of target data groups, so that a plurality of user identifiers corresponding to the plurality of target data groups one to one are obtained, and the plurality of user identifiers are used as data sequences.
Illustratively, for any two adjacent data groups in the plurality of data groups, any one of the two data groups is deleted based on the user identification in any two adjacent data groups, and the rest data groups are taken as a plurality of target data groups. For example, when the user identification in any two adjacent data groups is the same, any one of the two data groups is deleted.
Illustratively, the content identifications in any two adjacent data groups of the plurality of data groups are the same, or the user identifications in any two adjacent data groups of the plurality of data groups are the same. For example, two adjacent data sets 510, 520, the content identifiers in the two data sets are the same, and are both "label C1". Two adjacent data sets 520, 530, the user identities in the two data sets are the same, and are both "user 2".
And for any two adjacent data groups in the plurality of data groups, if the user identifications in any two adjacent data groups are the same, deleting any one of the two data groups to obtain the remaining plurality of data groups. For example, if two data sets 520, 530 have the same user identifier "user 2", then data set 520 or data set 530 is deleted, e.g., data set 530 is deleted, the remaining data sets include data set 510, data set 520, and data set 540, and the remaining data sets (data set 510, data set 520, and data set 540) are used as the target data sets.
Next, user identifiers are sequentially extracted from the target data sets to obtain a plurality of user identifiers corresponding to the target data sets one to one, for example, user identifier "user 1" is extracted from data set 510, user identifier "user 2" is extracted from data set 520, user identifier "user 3" is extracted from data set 540, and the plurality of user identifiers ("user 1", "user 2", "user 3") are used as data sequence 550. That is, the data sequence 550 is, for example, [ user 1, user 2, user 3 ].
Any two adjacent user identifications in data sequence 550 are associated with each other, for example, based on the same content identification. For example, the user identification "user 1" and the user identification "user 2" are associated based on the content identification "tag 1". User identification "user 2" and user identification "user 3" content identification based "tag C3"is associated.
In another example, data sequence 550 may also be represented as [ user 1, tag C ]1User 2, tag C3User 3]There is a content id between every two user ids in the data sequence 550 for associating the two user ids.
Illustratively, the data sets include, for example, first category data sets, each of which is determined based on, for example, a user identification.
For example, taking data set 510 as an example, the data set is a first category data set. Determining a current user identifier "user 1", determining, from the current user identifier "user 1", at least one content identifier ("tag C") associated with the current user identifier "user 1" based on the association 5001"and" Label C2") each having descriptive information 560.
Then, from at least one content identification ("tag C)1"and" Label C2") and identifies the current user identification (" user 1 ") and the selected content as a first category data set.
Identify "tag C" with content1For example, the description information of "tag 1" indicates that there is "tag C" in the history content for user 11"the number of the history contents is 2, and the history contents for the user 2 have a" label C1"has a history content number of 1.
For each of the at least one content identification, a probability corresponding to the content identification is determined. For example, the total number of the history contents for the user 1 is, for example, P (P is, for example, 3). For the current user identification "user 1", the content identification associated with "user 1" comprises "tag C1"and" Label C2", each content identity comprising a probability for a respective user. For example, determine and "tag C1"the corresponding probability is 2/P (2/3), and the probability 2/P indicates that the content with 2 current historical contents in the P historical contents of the user 1 is marked as" label C1", the probability 2/P characterizes the number of current history contents as 2. In the same way, determine and "tagC2"the corresponding probability is 1/P, and the probability 1/P represents that the content identification with 1 current historical content in the P historical contents for the user 1 is" label C2”。
Then, based on the probability, one content identifier is selected from at least one content identifier, for example from "tag 1" and "tag C2"Medium selection" tag C1"probability of 2/P, select" tag C2"has a probability of 1/P. For example, when the selected content is identified as "tag C1"time, identify the current user" user 1 "and the selected content" tag C1", as a first category data set (data set 510).
In the embodiment of the disclosure, the content identifier is selected based on the probability of the content identifier, so that the relevance between each user identifier in the acquired data sequence is stronger, and the training accuracy of the user feature extraction model is improved.
Illustratively, the data sets further include, for example, second category data sets, each of which is determined based on, for example, the content identification.
For example, taking the data set 520 as an example, the data set is the second category data set. Determining a current content identification "tag C" for a data group 5201", tag C is identified according to the current content1", determine the current content identifier" tag C "based on the association 5001"associated at least one user identification (" user 1 "and" user 2 ").
Then, a user identifier is selected from the at least one user identifier, and the probability that each of the at least one user identifier is selected is the same, for example. For example, when the selected user is identified as "user 2", the current content is identified as "tag C1"and the selected user identification" user 2 "as a second category data set (data set 520).
In an embodiment of the present disclosure, a plurality of data sequences may be determined based on the association relationship 500. For example, Q data sequences may be determined based on each user identification, in one example, Q is 100. Taking the user identification "user 1" as an example, 100 data sequences are determined based on "user 1", for example.
For example, the process of determining the first data sequence based on "user 1" is: tag C is determined based on "user 11", based on the label" C1"determine" user 2 ", tag C is determined based on" user 21", based on" label C1"determine" user 1 "so that the resulting data sequence is [ user 1, user 2, user 1 ]]. It will be appreciated that each user identification may be repeated in the same data sequence.
For example, the process of determining the second data sequence based on "user 1" is: tag C is determined based on "user 11", based on the label" C1"determine" user 2 ", tag C is determined based on" user 23", based on" label C3"determine" user 3 "so that the resulting data sequence is [ user 1, user 2, user 3]]。
From this repetition, determination of Q data sequences based on "user 1" can be achieved. Likewise, Q data sequences may be determined based on "user 2", Q data sequences may be determined based on "user 3", and so on. All of the data sequences determined may then be used as a plurality of data sequences for training the user feature extraction model.
For example, for each data sequence, a length L of each data sequence may be defined, for example, the length L of the data sequence [ user 1, user 2, user 3] is 3. It is understood that the length L of the data sequence may be set according to a specific application, and in an example, the length L is 100.
After obtaining the plurality of data sequences, the plurality of data sequences may be input into the user feature extraction model to be trained, so that model parameters of the user feature extraction model to be trained are adjusted based on the plurality of data sequences to obtain a trained user feature extraction model. Wherein the model parameters are associated with the user characteristics. In an example, when the user features are feature vectors, the model parameters may be feature vectors.
For example, for each data sequence, after the data sequence is input to the model, the model samples the data sequence to obtain a plurality of positively sampled sub-data and a plurality of negatively sampled sub-data. And adjusting initial parameters of the model based on the plurality of positive sample subdata and the plurality of negative sample subdata, wherein the initial parameters of the model comprise initial characteristic vectors of a plurality of users, and the adjusted model parameters are the required characteristic vectors of the plurality of users. The feature vector length may be set, for example, on a case-by-case basis, and in one example, the vector length is, for example, 64.
For example, the description will be made by taking an example of determining a plurality of positive sample sub-data and a plurality of negative sample sub-data for one user identification from one data sequence.
One data sequence is, for example, [ user 1, user 2, user 3, … …, user n ], and the positive sample sub-data for "user 3" includes, for example, [ user 1, user 3], [ user 2, user 3], and the distance (distance to the left or distance to the right) of the user identifier in each positive sample sub-data from "user 3" in the data sequence is less than or equal to a preset distance, for example, 2. For example, the distance between "user 1" and "user 3" in the positive-sampling sub data [ user 1, user 3] is 2, and the distance between "user 2" and "user 3" in the positive-sampling sub data [ user 2, user 3] is 1. It will be appreciated that the user identification "user 3" may occur multiple times in a data sequence.
In addition, all user identifications having a distance of more than 2 from "user 3" can be determined from the data sequence [ user 1, user 2, user 3, … …, user n ], the determined user identification for example having a low association with "user 3". Then, a plurality of user identifications are randomly selected from the determined user identifications, for example, 3 user identifications are randomly selected, and each selected user identification and "user 3" constitute negative sample sub-data.
In the embodiment of the present disclosure, the user feature extraction model is, for example, a matepath2vec model, for example, a vertex embedding model for a Heterogeneous Information Network (HIN).
How to acquire a plurality of history contents will be described below with reference to fig. 6 to 8.
FIG. 6 schematically shows a schematic diagram of obtaining history content according to an embodiment of the present disclosure.
As shown in fig. 6, a plurality of initial contents 610 are first obtained, and then the plurality of initial contents 610 are processed to obtain a plurality of historical contents 630.
In an example, each of the plurality of initial content 610 includes classification information including, for example, "movie", "entertainment", "star perimeter", "military", "social", "home", "travel", and the like. After the plurality of initial contents 610 are acquired, for the classification information of each initial content, the initial content of which the classification information is the first preset classification information among the plurality of initial contents 610 is deleted based on the classification information, and the remaining initial contents are made into the plurality of history contents 630. For example, the first preset classification information is classification information with low relevance to commercial products, and the first preset classification information includes, for example, "movie", "entertainment", "star periphery", "military", "social", and the like.
In an embodiment of the present disclosure, after the user features are extracted using the user feature extraction model, the associated user may be determined based on the user features so as to recommend the content to be recommended of the associated user. Since the content to be recommended is generally related to the commercial product, in order to ensure the accuracy of model training, the initial content with low relevance to the commercial product in the initial content may be deleted.
FIG. 7 schematically shows a schematic diagram of obtaining history content according to another embodiment of the present disclosure.
As shown in fig. 7, a plurality of initial contents 710 are first obtained, and for example, the initial contents of the plurality of initial contents 710 do not include the first preset classification information. Then, the plurality of initial contents 710 are processed to obtain a plurality of historical contents 730.
For example, after obtaining the plurality of initial content 710, an initial user identification and classification information for each initial content is determined, resulting in a plurality of initial user identifications. It will be appreciated that one initial user identification may correspond to multiple initial content, i.e. the multiple initial content is provided by the same user. For example, the plurality of initial user identifications includes "user 1" and "user 2".
For each of a plurality of initial user identifications, at least one initial content for the initial user identification is determined from a plurality of initial content. Taking the initial user identification as "user 1" as an example, each of the at least one initial content 721 for "user 1" is a content for user 1.
The classification information for the at least one initial content 721 includes, for example, K classification information, K being an integer of 1 or more, including, for example, "home", "travel", and the like. The target initial content is determined from the at least one initial content based on the classification information of each of the at least one initial content 721, the classification information of the target initial content being second preset classification information, the second preset classification information including, for example, K classification information, K being an integer greater than or equal to 1 and less than K. That is, the K pieces of classification information with the largest number of initial contents corresponding to the classification information are determined based on the K pieces of classification information.
The at least one initial content 721 is deleted if the ratio between the number of target initial contents corresponding to the k pieces of classification information and the total number of the at least one initial content 721 is less than a preset threshold, for example, 50%. Similarly, it is determined whether to delete at least one initial content 722 for "user 2" based on a similar manner, and if it is determined not to delete at least one initial content 722, the remaining at least one initial content 722 is treated as a plurality of history contents 730.
In the embodiment of the disclosure, for each user, the number of the main historical contents of the user is determined, and if the ratio of the number of the main historical contents to the total number of the historical contents of the user is low, which indicates that the specialty of the user is low, the historical contents of the part of users with low specialty are removed, so as to improve the quality of the historical contents used for training the model, so that when the model is trained by using the historical contents, the accuracy of model training can be improved.
FIG. 8 schematically shows a schematic diagram of obtaining history content according to another embodiment of the present disclosure.
As shown in fig. 8, after processing a plurality of initial contents based on first preset classification information and second preset classification information to obtain a plurality of historical contents 830, at least one user identification for the plurality of historical contents 830 is determined, the at least one user identification including, for example, "user 1" and "user 2".
Taking the user identifier "user 1" as an example, N pieces of history content 841 for the user identifier "user 1" are determined from the plurality of history contents 830, N being an integer greater than or equal to 1.
Then, M content identifications 851 for the N pieces of history content 841 are determined, M being an integer greater than or equal to 1. One content identity at a time is selected from the M content identities 851 as a target content identity 851A.
For the destination content identification 851A, P pieces of history content 841A for the destination content identification 751A are determined from the N pieces of history content 841, P being an integer greater than or equal to 1, and P being an integer less than or equal to N. When P is less than a preset number, the target content identification 851A is deleted, illustratively, the preset number is, for example, 2.
In the embodiment of the disclosure, for a plurality of content identifiers of each user, the content identifier with a small occurrence number in the plurality of content identifiers is deleted, so that when the historical content obtained after the content identifier is deleted is used for training a model, the model training precision is higher.
FIG. 9 schematically shows a flow diagram of content recommendation according to an embodiment of the present disclosure.
As shown in fig. 9, the content recommendation method 900 of the embodiment of the present disclosure may include, for example, operations S910 to S930.
In operation S910, a first user associated with a target user is determined.
In operation S920, a target second user is determined from at least one second user based on the user characteristics of the first user.
In operation S930, content to be recommended for the target second user is recommended to the target user.
For example, the similarity between the user feature of the target second user and the user feature of the first user satisfies the preset similarity condition, and the user feature of the first user and the user feature of the at least one second user are obtained by using the user feature extraction model above, for example. When the user feature includes a feature vector, the preset similarity condition includes, for example, that a vector distance between two feature vectors is smaller than a preset distance.
In an embodiment of the disclosure, the target user is associated with the first user, indicating that the target user is more interested in the content recommended by the first user. Because the similarity between the user characteristics of the target second user and the user characteristics of the first user is high, the target user is shown to be interested in the content provided by the target second user in a high probability, at the moment, the content to be recommended for the target second user is recommended to the target user, so that the recommendation effect of the content to be recommended is improved, and the recommended content to be recommended meets the requirements of the target user.
In another embodiment, the user of interest to the target user may be the first user. Alternatively, the user providing the history browsing content may be the first user according to the history browsing content of the target user.
For example, the embodiments of the present disclosure may recommend content to be recommended to a target user when the target user initiates a content acquisition request. Or, the content to be recommended may be actively recommended for the target user.
In the related art, when the content of the second user is recommended only to other users who focus on the second user, the purchase rate of performing purchase for the content of the second user is low. When the content recommendation method of the embodiment of the present disclosure is used, the purchase rate of performing purchase for the content of the second user is greatly increased.
FIG. 10 schematically shows a block diagram of a training apparatus for a user feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 10, the training apparatus 1000 for a user feature extraction model according to an embodiment of the present disclosure includes, for example, a first obtaining module 1010, a first determining module 1020, and a training module 1030.
The first obtaining module 1010 may be configured to obtain, for each of a plurality of historical contents, a user identification and a content identification for each of the historical contents. According to an embodiment of the present disclosure, the first obtaining module 1010 may perform, for example, the operation S210 described above with reference to fig. 2, which is not described herein again.
The first determining module 1020 may determine a plurality of data sequences based on an association relationship between the user identifier and the content identifier, where the association relationship indicates that the user identifier and the content identifier for the same historical content are associated, each data sequence includes at least two user identifiers, and any two adjacent user identifiers of the at least two user identifiers are associated with each other based on the same content identifier. According to an embodiment of the present disclosure, the first determining module 1020 may perform, for example, operation S220 described above with reference to fig. 2, which is not described herein again.
Training module 1030 may be configured to train a user feature extraction model using a plurality of data sequences. According to an embodiment of the present disclosure, the training module 1030 may perform, for example, the operation S230 described above with reference to fig. 2, which is not described herein again.
According to an embodiment of the present disclosure, the first determining module 1020 includes: a first determination submodule, a second determination submodule, an extraction submodule and a third determination submodule. The first determining submodule is used for sequentially determining a plurality of data groups based on the association relationship, wherein each data group comprises a user identifier and a content identifier associated with the user identifier. A second determining sub-module for determining a plurality of target data sets from the plurality of data sets. And the extraction submodule is used for sequentially extracting the user identifications from the target data groups to obtain a plurality of user identifications corresponding to the target data groups one by one. And the third determining submodule is used for taking a plurality of user identifications as data sequences.
According to an embodiment of the present disclosure, the second determination submodule includes a deletion unit and a first determination unit. And the deleting unit is used for deleting any one of the two data groups based on the user identifications in any two adjacent data groups aiming at any two adjacent data groups in the plurality of data groups. A first determination unit configured to take the remaining data groups as a plurality of target data groups.
According to the embodiment of the present disclosure, the content identifications in any two adjacent data groups of the plurality of data groups are the same, or the user identifications in any two adjacent data groups of the plurality of data groups are the same. And the deleting unit is used for responding to the same user identification in any two adjacent data groups and deleting any one of the two data groups.
According to an embodiment of the present disclosure, the data sets include a first category data set; the first determination submodule includes: the device comprises a second determining unit, a third determining unit, a first selecting unit and a fourth determining unit. And the second determining unit is used for determining the current user identification. And the third determining unit is used for determining at least one content identifier associated with the current user identifier based on the association relation according to the current user identifier. A first selection unit, configured to select a content identifier from the at least one content identifier. And the fourth determining unit is used for taking the current user identification and the selected content identification as the first category data group.
According to an embodiment of the present disclosure, the first selection unit includes a determination subunit and a selection subunit. And the determining subunit is used for determining the probability corresponding to the content identifier for each content identifier in the at least one content identifier, wherein the probability represents the number of the current historical content, and the content identifier and the current user identifier are for the current historical content. A selecting subunit, configured to select one content identifier from the at least one content identifier based on the probability.
According to an embodiment of the present disclosure, the data sets include a second category data set; the first determination submodule includes: a fifth determining unit, a sixth determining unit, a second selecting unit, and a seventh determining unit. And the fifth determining unit is used for determining the current content identification. And a sixth determining unit, configured to determine, according to the current content identifier, at least one user identifier associated with the current content identifier based on the association relationship. A second selection unit for selecting one user identifier from the at least one user identifier. And the seventh determining unit is used for taking the current content identification and the selected user identification as a second category data group.
According to an embodiment of the present disclosure, the apparatus 1000 may further include: the device comprises a second acquisition module, a first deletion module and a second determination module. And the second acquisition module is used for acquiring a plurality of initial contents. And the first deleting module is used for deleting the initial content of which the classification information is the first preset classification information in the plurality of initial contents based on the classification information aiming at the classification information of each initial content. And the second determining module is used for taking the remaining initial content as a plurality of historical contents.
According to an embodiment of the present disclosure, the apparatus 1000 may further include: the device comprises a third obtaining module, a third determining module and a fourth determining module. And the third acquisition module is used for acquiring a plurality of initial contents. A third determining module to determine initial user identification and classification information for each initial content. A fourth determining module for, for each initial user identity: the method comprises the steps of determining at least one initial content for initial user identification from a plurality of initial contents, determining target initial contents from the at least one initial content based on classification information of each initial content in the at least one initial content, wherein the classification information of the target initial contents is second preset classification information, deleting the at least one initial content in response to the fact that the ratio of the number of the target initial contents to the number of the at least one initial content is smaller than a preset threshold value, and taking the remaining initial contents in the plurality of initial contents as a plurality of historical contents.
According to an embodiment of the present disclosure, the apparatus 1000 may further include: a fifth determining module and a second deleting module. A fifth determining module to determine at least one user identification for a plurality of historical content. A second deletion module to, for each of the at least one user identification: determining N pieces of historical content aiming at a user identification from a plurality of historical contents, wherein N is an integer which is greater than or equal to 1, determining M content identifications aiming at the N pieces of historical content, M is an integer which is greater than or equal to 1, determining P pieces of historical content aiming at a target content identification from the N pieces of historical content aiming at the target content identification, P is an integer which is greater than or equal to 1, and P is an integer which is less than or equal to N, and deleting the target content identification in response to P being less than a preset number.
According to an embodiment of the present disclosure, training module 1030 includes an input sub-module and an adjustment sub-module. And the input submodule is used for inputting the plurality of data sequences into the user feature extraction model to be trained. And the adjusting submodule is used for adjusting the model parameters of the user feature extraction model to be trained on the basis of the plurality of data sequences to obtain the trained user feature extraction model, wherein the model parameters are associated with the user features.
Fig. 11 schematically shows a block diagram of a content recommendation device according to an embodiment of the present disclosure.
As shown in fig. 11, the content recommendation apparatus 1100 of the embodiment of the present disclosure includes, for example, a sixth determination module 1110, a seventh determination module 1120, and a recommendation module 1130.
The sixth determination module 1110 may be for determining a first user associated with a target user. According to an embodiment of the present disclosure, the sixth determining module 1110 may perform, for example, operation S910 described above with reference to fig. 9, which is not described herein again.
The seventh determining module 1120 may be configured to determine a target second user from the at least one second user based on the user characteristics of the first user, where a similarity between the user characteristics of the target second user and the user characteristics of the first user satisfies a preset similarity condition, and the user characteristics of the first user and the user characteristics of the at least one second user are obtained by using a user characteristic extraction model. According to the embodiment of the present disclosure, the seventh determining module 1120 may, for example, perform operation S920 described above with reference to fig. 9, which is not described herein again.
The recommending module 1130 may be used for recommending the content to be recommended for the target second user to the target user. According to an embodiment of the present disclosure, the recommending module 1130 may perform, for example, operation S930 described above with reference to fig. 9, which is not described herein again.
According to an embodiment of the present disclosure, the sixth determination module 1110 includes at least one of a fourth determination submodule and a fifth determination submodule. And the fourth determination submodule is used for taking the user concerned by the target user as the first user. And the fifth determining submodule is used for determining the historical browsing content of the target user and taking the user providing the historical browsing content as the first user.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 12 is a block diagram of an electronic device for implementing a training method of a user feature extraction model according to an embodiment of the present disclosure.
FIG. 12 illustrates a schematic block diagram of an example electronic device 1200 that can be used to implement embodiments of the present disclosure. The electronic device 1200 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.
Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1201 performs various methods and processes described above, such as a training method of a user feature extraction model. For example, in some embodiments, the training method of the user feature extraction model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the training method of the user feature extraction model described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured by any other suitable means (e.g., by means of firmware) to perform the training method of the user feature extraction model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
The electronic device may be configured to perform a content recommendation method. The electronic device may comprise, for example, a computing unit, a ROM, a RAM, an I/O interface, an input unit, an output unit, a storage unit and a communication unit. The computing unit, the ROM, the RAM, the I/O interface, the input unit, the output unit, the storage unit, and the communication unit in the electronic device have the same or similar functions as the computing unit, the ROM, the RAM, the I/O interface, the input unit, the output unit, the storage unit, and the communication unit of the electronic device shown in fig. 12, for example, and are not described again here.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (32)

1. A training method of a user feature extraction model comprises the following steps:
for each historical content in a plurality of historical contents, acquiring a user identifier and a content identifier for each historical content;
determining a plurality of data sequences based on the association relationship between the user identifications and the content identifications, wherein the association relationship indicates that the user identifications and the content identifications for the same historical content are associated, each data sequence comprises at least two user identifications, and any two adjacent user identifications of the at least two user identifications are associated with each other based on the same content identification; and
and training the user characteristic extraction model by using the plurality of data sequences.
2. The method of claim 1, wherein the determining a plurality of data sequences based on the association relationship between the user identifier and the content identifier comprises:
sequentially determining a plurality of data groups based on the association relationship, wherein each data group comprises a user identifier and a content identifier associated with the user identifier;
determining a plurality of target data sets from the plurality of data sets;
sequentially extracting user identifications from the target data groups to obtain a plurality of user identifications corresponding to the target data groups one by one; and
and using the plurality of user identifications as the data sequence.
3. The method of claim 2, wherein said determining a plurality of target data sets from said plurality of data sets comprises:
for any two adjacent data groups in the plurality of data groups, deleting any one of the two data groups based on the user identification in the any two adjacent data groups; and
and taking the rest data groups as the plurality of target data groups.
4. The method of claim 3, wherein the content identifications in any two adjacent data groups of the plurality of data groups are the same, or the user identifications in any two adjacent data groups of the plurality of data groups are the same;
wherein, the deleting any one of the two data groups based on the user identifiers in the two data groups comprises:
and in response to the user identification in any two adjacent data groups being the same, deleting any one of the two data groups.
5. The method of any of claims 2-4, wherein the data set comprises a first category data set; the sequentially determining a plurality of data sets based on the association comprises:
determining a current user identifier;
determining at least one content identifier associated with the current user identifier based on the association relation according to the current user identifier;
selecting a content identifier from the at least one content identifier; and
and taking the current user identification and the selected content identification as the first category data group.
6. The method of claim 5, wherein said selecting one of the at least one content identifier comprises:
for each content identifier of the at least one content identifier, determining a probability corresponding to the content identifier, wherein the probability characterizes the number of current historical contents for which the content identifier and the current user identifier are intended;
selecting one content identifier from the at least one content identifier based on the probability.
7. The method of any of claims 2-4, wherein the data set comprises a second category data set; the sequentially determining a plurality of data sets based on the association comprises:
determining a current content identifier;
determining at least one user identifier associated with the current content identifier based on the association relation according to the current content identifier;
selecting a user identity from the at least one user identity; and
and taking the current content identification and the selected user identification as the second category data group.
8. The method of any of claims 1-7, further comprising:
acquiring a plurality of initial contents;
deleting the initial content of which the classification information is the first preset classification information in the plurality of initial contents based on the classification information aiming at the classification information of each initial content; and
and taking the remaining initial content as the plurality of historical contents.
9. The method of any of claims 1-8, further comprising:
acquiring a plurality of initial contents;
determining initial user identification and classification information for each initial content;
for each initial user identification:
determining at least one initial content identified for the initial user from the plurality of initial contents;
determining target initial content from the at least one initial content based on the classification information of each initial content in the at least one initial content, wherein the classification information of the target initial content is second preset classification information; and
deleting the at least one initial content in response to a ratio between the number of the target initial content and the number of the at least one initial content being less than a preset threshold;
and taking the remaining initial contents of the plurality of initial contents as the plurality of historical contents.
10. The method of any of claims 1-9, further comprising:
determining at least one user identification for the plurality of historical content; and
for each of the at least one user identity:
determining N historical contents for the user identification from the plurality of historical contents, wherein N is an integer greater than or equal to 1;
determining M content identifications for the N historical contents, wherein M is an integer greater than or equal to 1;
for a target content identifier in the M content identifiers, determining P historical contents for the target content identifier from the N historical contents, wherein P is an integer greater than or equal to 1 and is an integer less than or equal to N; and
and in response to that P is less than the preset number, deleting the target content identification.
11. The method of any of claims 1-10, wherein said training the user feature extraction model using the plurality of data sequences comprises:
inputting the plurality of data sequences into a user feature extraction model to be trained; and
and adjusting model parameters of the user feature extraction model to be trained based on the plurality of data sequences to obtain the trained user feature extraction model, wherein the model parameters are associated with the user features.
12. A content recommendation method, comprising:
determining a first user associated with a target user;
determining a target second user from at least one second user based on the user characteristics of the first user, wherein the similarity between the user characteristics of the target second user and the user characteristics of the first user meets a preset similarity condition, and the user characteristics of the first user and the user characteristics of the at least one second user are obtained by using the user characteristic extraction model of any one of claims 1 to 8; and
recommending the content to be recommended aiming at the target second user to the target user.
13. The method of claim 12, wherein the determining a first user associated with a target user comprises at least one of:
taking the user concerned by the target user as the first user; and
and determining the historical browsing content of the target user, and taking the user providing the historical browsing content as the first user.
14. A training apparatus for a user feature extraction model, comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a user identifier and a content identifier of each historical content aiming at each historical content in a plurality of historical contents;
a first determining module, configured to determine a plurality of data sequences based on an association relationship between the user identifier and the content identifier, where the association relationship indicates that the user identifier and the content identifier for the same historical content are associated, each data sequence includes at least two user identifiers, and any two adjacent user identifiers of the at least two user identifiers are associated with each other based on the same content identifier; and
and the training module is used for training the user characteristic extraction model by utilizing the plurality of data sequences.
15. The apparatus of claim 14, wherein the first determining means comprises:
the first determining submodule is used for sequentially determining a plurality of data groups based on the incidence relation, wherein each data group comprises a user identifier and a content identifier associated with the user identifier;
a second determining submodule for determining a plurality of target data groups from the plurality of data groups;
the extraction submodule is used for sequentially extracting the user identifications from the target data groups to obtain a plurality of user identifications corresponding to the target data groups one by one; and
a third determining submodule, configured to use the plurality of user identifications as the data sequence.
16. The apparatus of claim 15, wherein the second determination submodule comprises:
a deleting unit, configured to delete, for any two adjacent data groups of the multiple data groups, any one of the two data groups based on the user identifiers in the any two adjacent data groups; and
a first determination unit configured to take the remaining data groups as the plurality of target data groups.
17. The apparatus of claim 16, wherein the content identifications in any two adjacent data groups of the plurality of data groups are the same, or the user identifications in any two adjacent data groups of the plurality of data groups are the same;
wherein the deleting unit is further configured to:
and in response to the user identification in any two adjacent data groups being the same, deleting any one of the two data groups.
18. The apparatus of any of claims 15-17, wherein the data set comprises a first category data set; the first determination submodule includes:
a second determining unit, configured to determine a current user identifier;
a third determining unit, configured to determine, according to the current user identifier, at least one content identifier associated with the current user identifier based on the association relationship;
a first selection unit, configured to select one content identifier from the at least one content identifier; and
and the fourth determining unit is used for taking the current user identifier and the selected content identifier as the first category data group.
19. The apparatus of claim 18, wherein the first selection unit comprises:
a determining subunit, configured to determine, for each of the at least one content identifier, a probability corresponding to the content identifier, where the probability characterizes a number of current historical contents, and the content identifier and the current user identifier are for the current historical contents;
a selecting subunit, configured to select one content identifier from the at least one content identifier based on the probability.
20. The apparatus of any of claims 15-17, wherein the data set comprises a second category data set; the first determination submodule includes:
a fifth determining unit, configured to determine a current content identifier;
a sixth determining unit, configured to determine, according to the current content identifier, at least one user identifier associated with the current content identifier based on the association relationship;
a second selection unit, configured to select one user identifier from the at least one user identifier; and
a seventh determining unit, configured to use the current content identifier and the selected user identifier as the second category data group.
21. The apparatus of any of claims 14-20, further comprising:
the second acquisition module is used for acquiring a plurality of initial contents;
the first deleting module is used for deleting the initial content of which the classification information is the first preset classification information in the plurality of initial contents based on the classification information aiming at the classification information of each initial content; and
and the second determining module is used for taking the remaining initial content as the plurality of historical contents.
22. The apparatus of any of claims 14-21, further comprising:
a third obtaining module, configured to obtain a plurality of initial contents;
a third determining module for determining initial user identification and classification information for each initial content; and
a fourth determining module for, for each initial user identity:
determining at least one initial content identified for the initial user from the plurality of initial contents;
determining target initial content from the at least one initial content based on the classification information of each initial content in the at least one initial content, wherein the classification information of the target initial content is second preset classification information; and
deleting the at least one initial content in response to a ratio between the number of the target initial content and the number of the at least one initial content being less than a preset threshold;
and taking the remaining initial contents of the plurality of initial contents as the plurality of historical contents.
23. The apparatus of any of claims 14-22, further comprising:
a fifth determining module for determining at least one user identity for the plurality of historical content; and
a second deletion module to, for each of the at least one user identification:
determining N historical contents for the user identification from the plurality of historical contents, wherein N is an integer greater than or equal to 1;
determining M content identifications for the N historical contents, wherein M is an integer greater than or equal to 1;
for a target content identifier in the M content identifiers, determining P historical contents for the target content identifier from the N historical contents, wherein P is an integer greater than or equal to 1 and is an integer less than or equal to N; and
and in response to that P is less than the preset number, deleting the target content identification.
24. The apparatus of any of claims 14-23, wherein the training module comprises:
the input submodule is used for inputting the data sequences into a user feature extraction model to be trained; and
and the adjusting submodule is used for adjusting the model parameters of the user feature extraction model to be trained on the basis of the plurality of data sequences to obtain the trained user feature extraction model, wherein the model parameters are associated with the user features.
25. A content recommendation apparatus comprising:
a sixth determining module for determining a first user associated with the target user;
a seventh determining module, configured to determine a target second user from at least one second user based on a user feature of the first user, where a similarity between the user feature of the target second user and the user feature of the first user meets a preset similarity condition, and the user feature of the first user and the user feature of the at least one second user are obtained by using the user feature extraction model according to any one of claims 1 to 8; and
and the recommending module is used for recommending the content to be recommended aiming at the target second user to the target user.
26. The apparatus of claim 25, wherein the sixth determining means comprises at least one of:
a fourth determining submodule, configured to use a user concerned by the target user as the first user; and
and the fifth determining submodule is used for determining the historical browsing content of the target user and taking the user providing the historical browsing content as the first user.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
28. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 12-13.
29. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.
30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 12-13.
31. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.
32. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 12-13.
CN202110487930.4A 2021-04-30 2021-04-30 Training method of user feature extraction model, content recommendation method and device Active CN113111268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110487930.4A CN113111268B (en) 2021-04-30 2021-04-30 Training method of user feature extraction model, content recommendation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110487930.4A CN113111268B (en) 2021-04-30 2021-04-30 Training method of user feature extraction model, content recommendation method and device

Publications (2)

Publication Number Publication Date
CN113111268A true CN113111268A (en) 2021-07-13
CN113111268B CN113111268B (en) 2024-06-11

Family

ID=76721258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110487930.4A Active CN113111268B (en) 2021-04-30 2021-04-30 Training method of user feature extraction model, content recommendation method and device

Country Status (1)

Country Link
CN (1) CN113111268B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010262383A (en) * 2009-04-30 2010-11-18 Ntt Docomo Inc Recommendation information generation device and recommendation information generation method
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
WO2020156389A1 (en) * 2019-01-30 2020-08-06 北京字节跳动网络技术有限公司 Information pushing method and device
CN111552884A (en) * 2020-05-13 2020-08-18 腾讯科技(深圳)有限公司 Method and apparatus for content recommendation
CN111737582A (en) * 2020-07-29 2020-10-02 腾讯科技(深圳)有限公司 Content recommendation method and device
CN112330382A (en) * 2020-05-28 2021-02-05 北京沃东天骏信息技术有限公司 Item recommendation method and device, computing equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010262383A (en) * 2009-04-30 2010-11-18 Ntt Docomo Inc Recommendation information generation device and recommendation information generation method
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
WO2020156389A1 (en) * 2019-01-30 2020-08-06 北京字节跳动网络技术有限公司 Information pushing method and device
CN111552884A (en) * 2020-05-13 2020-08-18 腾讯科技(深圳)有限公司 Method and apparatus for content recommendation
CN112330382A (en) * 2020-05-28 2021-02-05 北京沃东天骏信息技术有限公司 Item recommendation method and device, computing equipment and medium
CN111737582A (en) * 2020-07-29 2020-10-02 腾讯科技(深圳)有限公司 Content recommendation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周炜翔;张雯;杨博;柳毅;张琳;张仰森;: "面向微博用户的个性化推荐算法研究", 计算机工程, no. 10, 31 December 2020 (2020-12-31) *
张宏;杜鹏;: "基于标签的商品推荐系统的设计", 黑龙江科技信息, no. 17, 15 June 2017 (2017-06-15) *

Also Published As

Publication number Publication date
CN113111268B (en) 2024-06-11

Similar Documents

Publication Publication Date Title
CN108520470B (en) Method and apparatus for generating user attribute information
US20170206361A1 (en) Application recommendation method and application recommendation apparatus
CN108595448B (en) Information pushing method and device
CN110992127B (en) Article recommendation method and device
CN113627536B (en) Model training, video classification method, device, equipment and storage medium
CN112182370A (en) Method and device for pushing item category information, electronic equipment and medium
CN112765478A (en) Method, apparatus, device, medium, and program product for recommending content
CN113947701B (en) Training method, object recognition method, device, electronic equipment and storage medium
CN114139052A (en) Ranking model training method for intelligent recommendation, intelligent recommendation method and device
CN113761565B (en) Data desensitization method and device
CN113111268B (en) Training method of user feature extraction model, content recommendation method and device
CN114036397B (en) Data recommendation method, device, electronic equipment and medium
CN107679030B (en) Method and device for extracting synonyms based on user operation behavior data
CN113722593B (en) Event data processing method, device, electronic equipment and medium
CN113536087B (en) Method, device, equipment, storage medium and program product for identifying cheating sites
CN110929512A (en) Data enhancement method and device
CN114860411A (en) Multitask learning method and device, electronic equipment and storage medium
CN113127683A (en) Content recommendation method and device, electronic equipment and medium
CN113961797A (en) Resource recommendation method and device, electronic equipment and readable storage medium
CN110378714B (en) Method and device for processing access data
CN113360770B (en) Content recommendation method, device, equipment and storage medium
CN109949117B (en) Method and device for pushing information
CN114547417B (en) Media resource ordering method and electronic equipment
CN108536362B (en) Method and device for identifying operation and server
CN116881579A (en) Method, device, equipment and storage medium for constructing attention graph model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant