CN111552884B - Method and apparatus for content recommendation - Google Patents

Method and apparatus for content recommendation Download PDF

Info

Publication number
CN111552884B
CN111552884B CN202010402143.0A CN202010402143A CN111552884B CN 111552884 B CN111552884 B CN 111552884B CN 202010402143 A CN202010402143 A CN 202010402143A CN 111552884 B CN111552884 B CN 111552884B
Authority
CN
China
Prior art keywords
content
user
click
attribute
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010402143.0A
Other languages
Chinese (zh)
Other versions
CN111552884A (en
Inventor
张晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010402143.0A priority Critical patent/CN111552884B/en
Publication of CN111552884A publication Critical patent/CN111552884A/en
Application granted granted Critical
Publication of CN111552884B publication Critical patent/CN111552884B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A method for content recommendation is described, comprising: acquiring historical behavior data of a user and a user portrait; acquiring content data of a plurality of content files, wherein the content data of each content file comprises at least one content attribute of each content file; determining feature values for a plurality of features characterizing the user's interest in each content file based on at least one content attribute, user portraits, and historical behavior data for said each content file, said plurality of features including click behavior features; determining a score for each content file based at least on the feature values of the plurality of features; a predetermined number of content files from the plurality of content files are selected for recommendation to the user based on the scores of the plurality of content files.

Description

Method and apparatus for content recommendation
Technical Field
The present disclosure relates to the technical field of personalized recommendations, in particular to a method and apparatus for content recommendation.
Background
With the development of internet technology, users can watch or listen to different types of content such as video, audio, graphics, atlas, etc. on various websites on the internet. Meanwhile, the website server can also recommend the content meeting the user interests to the user in a personalized mode by deeply mining the user interests so as to improve the click rate of the user on the content.
Disclosure of Invention
In the related art, content is generally recommended to a user based on the user's historical interests in the past period of time, however, this easily causes that the user's long-term interests are severely dependent and short-term interests are insufficiently characterized when recommending content to the user, and the user's interest changes cannot be captured in time. For example, from the long-term interests of the user, the user has a stronger interest in "iron man" and a weaker interest in "Nezha", because the user historically has more "iron man" content and less "Nezha" content; from the short-term interests of the user, the "iron man" content was recently presented to the user 10 times but only once, and the "Nezha" content was presented 3 times but clicked 3 times, due to the relatively fire of the recent movie "Miao Tong Jiang" of Nezha. Obviously, the user is more interested in the content about "Nezha" in a short period, but when the content is recommended, a large amount of content about "iron man" is still recommended to the user, but little content about "Nezha" is recommended to the user, so that the problems of poor content recommendation efficiency, low accuracy and poor user experience are caused.
In view of the above, the present disclosure provides methods and apparatus, computing devices, and computer-readable storage media for content recommendation, it is desirable to overcome some or all of the above-referenced drawbacks, as well as other possible drawbacks.
According to a first aspect of the present disclosure, there is provided a method for content recommendation, comprising: acquiring historical behavior data of a user and a user portrait, wherein the historical behavior data comprises data related to historical clicks of the user on a content file, the user portrait comprises a plurality of interest categories of the user, and each interest category comprises content attributes of the content file; acquiring content data of a plurality of content files, wherein the content data of each content file comprises at least one content attribute of each content file; determining, based on the at least one content attribute of each content file, the user representation and historical behavior data, feature values for a plurality of features characterizing the user's interest in the each content file, the plurality of features including a click behavior feature, the click behavior feature being related to a number of occurrences of each of the at least one content attribute in a history of clicks of the user within a preset number of recent clicks window; determining a score for each content file based at least on the feature values of the plurality of features; a predetermined number of content files from the plurality of content files are selected for recommendation to the user based on the scores of the plurality of content files.
In some embodiments, obtaining historical behavioral data of the user and the user representation includes: in response to receiving a current content recommendation request for a user, historical behavioral data of the user and a user portrait are obtained.
In some embodiments, obtaining historical behavioral data of a user and a user representation includes: acquiring historical behavior data of a user; and acquiring the user portrait of the user based on the historical behavior data of the user.
In some embodiments, each interest classification further includes an interest level corresponding to a content attribute of the content file, and the click behavior feature includes: the method comprises the steps that each content attribute and the history of the user in each corresponding preset latest click number sub-window in at least one preset latest click number sub-window are combined with each corresponding characteristic of the occurrence number of each content attribute, wherein the at least one preset latest click number sub-window is a sub-window of the preset latest click number window; and a respective combined feature of said each content attribute, a ranking of said each content attribute's interestingness in its corresponding interest category, and said number of occurrences of said each content attribute.
In some embodiments, the plurality of features further includes a click time feature related to a click time of the user's historical clicks on the content file having the each content attribute within the preset number of recent clicks window.
In some embodiments, the click time feature comprises: a time interval between a click time when the content file having the each content attribute is clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request; and a time interval between a click time when the content file having the each content attribute is clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request, and a combined feature of both the each content attribute.
In some embodiments, the click time feature further comprises: the content file with each content attribute is located in a content recommendation request interval with the request number of the current content recommendation request when being clicked in a preset sequence in the preset latest click number window; and the content file with each content attribute is located in a request number interval of the content recommendation request and the current content recommendation request when being clicked in a preset sequence in the preset latest click number window, and the combined characteristic of the content file with each content attribute.
In some embodiments, wherein the historical behavior data further comprises data related to a historical presentation of content files to the user, and the plurality of features further comprises a presentation time feature related to a presentation time of the content files having each of the at least one content attribute to the user in a historical presentation of a preset recent period of time.
In some embodiments, the presentation time feature comprises: a time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of a current content recommendation request; and a time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of the current content recommendation request, and a combined feature of both the each content attribute.
In some embodiments, presenting the temporal feature further comprises: the content file with each content attribute is in a request number interval of a content recommendation request and a current content recommendation request when being presented in a preset sequence in the history presentation of the preset latest period; and a combination feature of the content file having each content attribute and a request number interval of the content recommendation request and the current content recommendation request, which are present in a predetermined order in the history presentation of the preset latest period.
In some embodiments, obtaining content data for a plurality of content files includes: content data of the plurality of content files is acquired based on a user portrait of the user.
In some embodiments, determining feature values for a plurality of features characterizing the user's interest in each of the content files based on at least one content attribute of the each content file, the user portraits, and historical behavior data comprises: acquiring an original value of each of the plurality of features; and obtaining the characteristic value of each characteristic based on the original value of each characteristic and the corresponding characteristic name.
In some embodiments, obtaining the feature value of each feature based on the original value of each feature and the corresponding feature name includes: hashing the original value of each feature to obtain a first hash value of each feature; hashing the character string of the feature name of each feature to obtain a second hash value of each feature; and obtaining the characteristic value of each characteristic based on the first hash value and the second hash value of each characteristic.
In some embodiments, determining the score for each content file based at least on the feature values of the plurality of features comprises: inputting at least the feature values of the plurality of features into a trained intelligent scoring model to obtain a score for each content file; the trained intelligent scoring model is obtained through training according to positive sample data and negative sample data; wherein, in presenting the plurality of content files to the user, feature values of a plurality of features characterizing the user's interest in a clicked content file of the presented plurality of content files are taken as positive sample data, and feature values of a plurality of features characterizing the user's interest in a non-clicked content file of the presented plurality of content files are taken as negative sample data.
In some embodiments, selecting a predetermined number of content files from the plurality of content files for recommendation to the user based on the scoring of the plurality of content files comprises: based on the scores of the plurality of content files, sorting the plurality of content files according to the scores to obtain an ordered sequence of the content files; a predetermined number of content files are selected for recommendation to the user starting from a first content file in the ordered sequence of content files.
According to a second aspect of the present disclosure, there is provided a method for content recommendation, comprising: acquiring a predetermined number of content files selected for recommendation to a user according to the method of the first aspect of the present disclosure; presenting the predetermined number of content files.
According to a third aspect of the present disclosure, there is provided an apparatus for content recommendation, comprising: a first acquisition module configured to acquire historical behavior data of a user and a user representation, the historical behavior data including data related to historical clicks of a content file by the user, the user representation including a plurality of interest categories of the user, each interest category including content attributes of the content file; a second acquisition module configured to acquire content data of a plurality of content files, the content data of each content file including at least one content attribute of said each content file; a first determination module configured to determine, based on at least one content attribute of each content file, the user representation, and historical behavior data, feature values for a plurality of features characterizing the user's interest in the each content file, the plurality of features including click behavior features that relate to a number of occurrences of each of the at least one content attribute in a history of the user within a preset number of recent clicks window; a second determination module configured to determine a score for each content file based at least on the feature values of the plurality of features; a selection module configured to select a predetermined number of content files from the plurality of content files for recommendation to the user based on the scores of the plurality of content files.
According to a fourth aspect of the present disclosure, there is provided an apparatus for content recommendation, comprising: a content file acquisition module configured to acquire a predetermined number of content files for recommendation to a user from the apparatus according to the third aspect of the present disclosure; a presentation module configured to present the predetermined number of content files.
According to a fifth aspect of the present disclosure, there is provided a computing device comprising a processor; and a memory configured to store computer-executable instructions thereon that, when executed by the processor, perform any of the methods described above.
According to a sixth aspect of the present disclosure, there is provided a computer readable storage medium storing computer executable instructions which, when executed, perform any of the methods as described above.
In the method and the device for content recommendation, which are claimed in the disclosure, by using the click behavior characteristics related to the occurrence times of each content attribute of the user in the preset latest click number window during content recommendation, the short-term interest of the user can be fully reflected and the change of the interest of the user can be quickly reflected during content recommendation, so that the accuracy of content recommendation can be greatly improved, and key indexes such as the click rate, the click quantity and the like of the recommended content are improved.
These and other advantages of the present disclosure will become apparent from and elucidated with reference to the embodiments described hereinafter.
Drawings
Embodiments of the present disclosure will now be described in more detail and with reference to the accompanying drawings, in which:
FIG. 1 illustrates an exemplary application scenario in which a technical solution according to an embodiment of the present disclosure may be implemented;
FIG. 2 shows a schematic flow chart of a method for content recommendation according to one embodiment of the present disclosure;
FIG. 3 illustrates a schematic diagram of interest classification of a user's hierarchical structure, according to one embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of determining click behavior characteristics in accordance with one embodiment of the present disclosure;
FIG. 5 illustrates an example graph of determining click behavior characteristics in accordance with one embodiment of the present disclosure;
FIG. 6 illustrates a schematic diagram of determining click time features and presentation time features according to one embodiment of the present disclosure;
FIG. 7 shows a schematic flow chart diagram of a method for content recommendation according to another embodiment of the present disclosure;
FIG. 8 illustrates an architectural diagram of a method for content recommendation according to one embodiment of the present disclosure;
FIG. 9 illustrates an architectural diagram for ordering video according to one embodiment of the present disclosure;
FIG. 10 illustrates an overall flow of click rate estimation according to one embodiment of the present disclosure;
FIG. 11 illustrates an architectural diagram of model training according to one embodiment of the present disclosure;
FIG. 12 shows a schematic diagram of presenting content files recommended by a method for content recommendation according to an embodiment of the disclosure;
FIG. 13 illustrates an exemplary block diagram of an apparatus for content recommendation in accordance with one embodiment of the present disclosure;
FIG. 14 illustrates an exemplary block diagram of an apparatus for content recommendation in accordance with another embodiment of the present disclosure; and
FIG. 15 illustrates an example system including an example computing device that represents one or more systems and/or devices that can implement the various techniques described herein.
Detailed Description
The following description provides specific details of various embodiments of the disclosure so that those skilled in the art may fully understand and practice the various embodiments of the disclosure. It should be understood that the technical solutions of the present disclosure may be practiced without some of these details. In some instances, well-known structures or functions have not been shown or described in detail to avoid obscuring the description of embodiments of the present disclosure with such unnecessary description. The terminology used in the present disclosure should be understood in its broadest reasonable manner, even though it is being used in conjunction with a particular embodiment of the present disclosure.
First, some terms involved in the embodiments of the present application will be described so as to be easily understood by those skilled in the art:
The content is as follows: any information or data that may be viewed, listened to, perceived by a user herein, which may be, for example, video, audio, graphics, atlases, etc.; accordingly, a content file refers to a carrier for carrying the content, such as a video file, an audio file, a web page, etc.;
The combination characteristics are as follows: forming a cross feature by combining the individual features;
User portrayal: is a user model built on top of a series of attribute data that may include multiple interest categories for the user, which may be abstracted from the user's historical behavioral data, for example. Each interest category includes content attributes of a content file (which may include content attributes of a content file that the user has historically clicked on, as an example), and optionally includes a degree of interest corresponding to the content attributes of the content file, a schematic diagram of which is shown in fig. 3. It should be noted that the user representation may be determined in various other ways, for example, by obtaining a user input of own interest categories, which is not limiting.
Fig. 1 illustrates an exemplary application scenario 100 in which a technical solution according to an embodiment of the present disclosure may be implemented. As shown in fig. 1, the application scenario 100 includes a server 110, terminals 120, 130, and a network 140. Terminals 120, 130 are communicatively coupled with server 110 via network 140. The user may view content, which may be, for example, video, audio, graphics, etc., through an application or client on the terminal 120, 130. The server can deeply mine the interests of the user and recommend the content meeting the interests of the user to the user in a personalized manner through an application program or a client side of the terminal.
For convenience of description, the description will be given taking an example in which the user opens or logs in the corresponding client viewing content on the terminal 120, it should be understood that the user opens or logs in the corresponding client viewing content on the terminal 130 has the same effect. As an example, the terminal 120 may transmit a content recommendation request for the user to the server 110 through the network 140 when the user opens or logs in the corresponding client on the terminal 120 to view the content. The server 110 may acquire historical behavior data of a user and a user portrait after receiving the content recommendation request, and acquire content data of a plurality of content files, where the content data of each content file includes at least one content attribute of each content file; based on the at least one content attribute of each content file, the user representation and historical behavior data, feature values for a plurality of features characterizing the user's interest in each content file are determined, the plurality of features may include, for example, click behavior features that relate to a number of occurrences of each of the at least one content attribute in a history of the user within a preset number of recent clicks window. The server 110 may then determine a score for each of the content files based at least on the characteristic values of the plurality of characteristics, then select a predetermined number of content files from the plurality of content files for recommendation to the user based on the scores of the plurality of content files, and may send the predetermined number of content files to the client on the terminal 120 over the network 140 for presentation to the user to enable content recommendation to the user.
Alternatively, server 110 may be a content server of a content provider, a device associated with a content server, a system-on-chip, and/or any other suitable computing device or computing system. The server 110 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, i.e. content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminals 120 and 130 may be, but are not limited to, smart phones, tablet computers, notebook computers, desktop computers, smart speakers, smart watches, etc. The terminals 120, 130 and the server 110 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. The network 140 may be, for example, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a public telephone network, an intranet, and any other type of network known to those skilled in the art. It should also be noted that the above-described scenario is merely one example in which embodiments of the present disclosure may be implemented, and is not limiting.
Fig. 2 illustrates a schematic flow diagram of a method 200 for content recommendation according to one embodiment of the present disclosure. The method 200 may be implemented, for example, on the server 110 shown in fig. 1. As shown in fig. 2, the method 200 may include the following steps.
In step 201, historical behavior data of a user and a user portrait are acquired. The historical behavior data of the user includes data related to historical clicks of the content file by the user and may optionally include data related to historical presentations of content files presented to the user. Historical behavior data is typically recorded or saved in the form of logs. The data related to the user's historical clicks on the content file are, for example, data related to the user's clicks on the content file over a predetermined period of time in the past (e.g., over the past 3 days), data related to the user's clicks on the content file over a predetermined number of recent clicks window in the past (e.g., over the last 128 clicks), etc. Historical presentation-related data for presenting content files to the user is, for example, related data for presenting content files to the user over a predetermined period of time in the past (e.g., over the past 3 days), related data for presenting content files to the user over a preset number of recent clicks in the past (e.g., over the last 128 clicks), and the like. Typically, data related to the user's historical clicks on content files is saved in units of one click, which saves identifiers of content files, click times, etc.; the history of presenting the content file to the user (also referred to as history presentation) is saved in units of one content recommendation request, which saves an identifier of the content file, presentation time, etc., and may request, for example, a plurality of content files per content recommendation request. Each content file has at least one content attribute, for example, the content attribute of each content file may be queried with the identifier of the content file. The user representation includes a plurality of interest categories for the user, each interest category including content attributes of a content file. Each interest category may also include a degree of interest corresponding to a content attribute of the content file. The interest level corresponding to the content attribute may be determined according to the number of clicks or click rate of the user on the content file having the content attribute. The higher the number of clicks or click rate of a user on a content file having a certain content attribute, the higher the interest level corresponding to the content attribute.
It should be noted that "clicking" as described herein may refer to a conventional click of the content file by the user (e.g., via a mouse, etc.), may also refer to various forms of input (e.g., via voice input, gesture input, visual input) that represent a click or confirmation of the content file by the user, and so forth.
In some embodiments, the multiple interest classifications of the user have a hierarchical structure, for example, a first class classification, a second class classification, a tag classification, and the like may be sequentially included from an upper layer to a lower layer. By way of example, FIG. 3 illustrates a schematic diagram of interest classification for a user's hierarchical structure. As shown in FIG. 3, the user has two content attributes, namely "entertainment" and "sports" in the first class category, with the numbers in brackets being their corresponding interestingness of 0.6 and 0.4, respectively. The content attribute of the first class is "entertainment" with the content attribute of the second class of "movie" and "variety", the content attribute of the second class is "movie" with the content attribute of the second class of "iron man" and "Nezha" with the content attribute of the two label classifications, the number in brackets is its corresponding interestingness too. It should be noted that the number of layers of the hierarchical structure is not limiting, e.g., there may also be three-level classification, four-level classification, etc. between two-level classification and tag classification.
In some embodiments, the current content recommendation request for the user may be received before the historical behavior data of the user and the user representation are obtained, namely: historical behavioral data of the user and the user portraits are obtained in response to receiving a current content recommendation request for the user. As described above, the current content recommendation request for the user may be transmitted when the user opens or logs in to the corresponding client to view the content, or when the user wants to view new content, although this is not limitative.
User portraits are typically determined based on historical behavioral data of the user. The user representation may be determined in advance from all of the historical behavior data of the user, and then the historical behavior data of the user (which may be part of all of the historical behavior data of the user, for example) and the user representation required in the content recommendation may be obtained. In some embodiments, in acquiring the historical behavior data of the user and the user representation, the historical behavior data of the user may be acquired first, and then the user representation of the user may be acquired based on the acquired historical behavior data of the user, which is not limiting.
In step 202, content data of a plurality of content files is obtained, the content data of each content file comprising at least one content attribute of said each content file. The content attributes may be used to characterize particular aspects of the content file, which may be content attributes in the interest categories as described above, e.g., primary category content attributes, secondary category content attributes, tag category content attributes, and so forth.
In some embodiments, content data for a plurality of content files may be acquired based on user images. For example, if the first class category included in the user image has a "sports" content attribute, the content data of the content file related to "sports" may be obtained, and at this time, the content file related to "sports" is used as a candidate content file for recommending to the user, which may reduce the range of obtaining the content file during content recommendation, save processing resources, and improve the efficiency of content recommendation.
At step 203, feature values for a plurality of features characterizing the user's interest in each content file are determined based on the at least one content attribute of each content file, the user portraits, and historical behavior data. The plurality of features includes a click behavior feature, which is a series of features related to the number of occurrences of each of the at least one content attribute in a history of clicks of the user within a preset number of recent clicks window. The preset number of recent clicks window may be set as desired, for example, to the last 128 clicks. That is, the click behavior feature may be, for example, a feature related to the number of occurrences of each of the at least one content attribute in the last 128 clicks by the user.
In some embodiments, the click behavior feature may include: the method comprises the steps that each content attribute and the history of the user in each corresponding preset latest click number sub-window in at least one preset latest click number sub-window are combined with each corresponding characteristic of the occurrence number of each content attribute, wherein the at least one preset latest click number sub-window is a sub-window of the preset latest click number window; the ranking of each content attribute, the interestingness of each content attribute in the corresponding interest classification thereof, and the number of occurrences of each content attribute in the history points of the user in each corresponding preset closest click number sub-window in at least one preset closest click number sub-window. As an example, the content attributes in the interest classification may be ranked by their corresponding interests from high to low, and then the ranking of the interests of each content attribute therein (i.e., what number of bits the corresponding interests are ranked) may be derived therefrom.
Taking the last click number window as the last 128 clicks as an example, the at least one preset last click number sub-window may be, for example, 4 preset last click number sub-windows, which are sub-windows of last 5 clicks, last 20 clicks, last 50 clicks, and last 100 clicks, respectively. It should be noted that the number of at least one preset closest click number sub-window is not limiting, nor is the preset number of recent clicks included in each sub-window limiting, e.g., there may be a preset number of recent clicks sub-window of 30 recent clicks.
Table 1 below shows click behavior characteristics in the case where there are 4 preset number of last clicks sub-windows (last 5 clicks, last 20 clicks, last 50 clicks, last 100 clicks, respectively) and three content attributes (one primary classified content attribute, one secondary classified content attribute, one tab classified content attribute) are present.
TABLE 1
As shown in table 1, taking a sub-window of the last 5 clicks and a content attribute of a primary class (for example, may be "sports" as shown in fig. 3) as an example, the click behavior feature includes a corresponding combination feature of the content attribute of the primary class and the number of occurrences of the content attribute of the primary class in the last 5 clicks by the user; and the content attributes of the primary classification, the ranking of the interestingness of the content attributes of the primary classification in their corresponding interestingness classification (i.e., primary classification), and the respective combined characteristics of the number of occurrences.
As an example, as shown in fig. 4, the content attributes in the tag categories involved in the last 5 clicks of the user are Ding Junhui, which are Nezha, sun Gonglei, singer, and which are respectively from far to near, with the interestingness of the "which" content attribute being 0.001 and the ranking being 7 in the tag category. As can be seen from fig. 4, the "Nezha" content attribute appears 2 times in the last 5 clicks of the user. Thus, in determining a feature value for a feature characterizing a user's interest in a content file that includes a "Nezha" content attribute, click behavior features may include: the "Nezha" content attribute and its combined feature of number 2 that occurs in the last 5 clicks; and the combined feature of rank 7 of the interestingness of the "Nezha" content attribute in its corresponding interest category (i.e., tag category) with the number of times 2 it occurred in the last 5 clicks.
By using the click behavior characteristics related to the occurrence times of the user clicking each content attribute in the at least one content attribute in the history of the user within the preset latest click number window, the effect of describing the short-term interests of the user and rapidly capturing the changes of the interests of the user by utilizing the latest click times of the user is realized, and the accuracy of content recommendation can be greatly improved, so that key indexes such as the click rate, the click quantity and the like of recommended content are improved. In particular, the preset recently clicked number sub-window can be optionally utilized to achieve the effect of more finely describing the short-term interests of the user and capturing the changes of the interests of the user more quickly, so that the accuracy of content recommendation is greatly improved.
In some embodiments, the plurality of features further includes a click time feature. The click time feature is for example a series of features related to the click time of the user on a content file having said each content attribute within said preset number of last clicks window. The preset number of recent clicks window may be set as desired, for example, as well as to the last 128 clicks, although this is not limiting. In this case, the click time feature may be, for example, a feature related to the click time of the user's history of clicks on the content file having the each content attribute in the last 128 clicks.
Consider the example shown in fig. 5 where if the user clicks on a video whose tag class content attribute is "Nezha" at 12 noon on monday, then no recommended video is requested until such time as the recommended video is re-requested at 12 noon on monday, and his last click is at 12 noon on monday. Furthermore, if the user clicks on a video whose tag class content attribute is "Nezha" in the noon on sunday, when he requests the recommended video again after 10 minutes, the latest click is the click before 10 minutes. The effect of these two time-spaced different clicks is the same from the previous click behavior feature. In this embodiment, by using the click time feature, the difference in the click time of the two clicks can be represented, and the accuracy of the short-term interest depiction for the user is further improved.
In some embodiments, the click time feature comprises: a time interval between a click time when the content file having the each content attribute is clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request; a time interval between a click time of the content file having the each content attribute when clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request, and a combined characteristic of both the each content attribute.
As an example, the predetermined order may be specified as needed, for example, it may be the nearest 1 st, the nearest 2 nd, the farthest 1 st, or the like. Table 2 below shows click time characteristics in the case where the preset number of recent clicks window is 128 most recent clicks, the predetermined order is 1 most recent and 1 most distant (i.e., 1 st and 128 th) as an example, and three content attributes (one content attribute of one primary classification, one content attribute of one secondary classification, one content attribute of one tag classification) exist.
TABLE 2
As shown in table 2, taking the content attribute of the first class as an example, the click time feature includes: a time interval between a click time of the content file having the content attribute of the first class at the latest 1 click and a time of the current content recommendation request; a combination feature of a time interval of a click time of the content file having the content attribute of the first class at the latest 1 click and a time of a current content recommendation request, and the content attribute; a time interval between a click time of the farthest 1 click of the content file with the content attribute of the first class and a time of the current content recommendation request; the time interval between the click time of the farthest 1 click of the content file with the content attribute of the first class and the time of the current content recommendation request, and the combination characteristic of the content attribute.
In some embodiments, the click time feature may further comprise: the content file with each content attribute is located in a content recommendation request interval with the request number of the current content recommendation request when being clicked in a preset sequence in the preset latest click number window; the content file with each content attribute is located at a request number interval of the content recommendation request and the current content recommendation request when clicked in a predetermined order within the preset latest click number window, and the combination characteristic of both the content attributes. By utilizing such click time features, the bias that may be caused by click time features that are purely time interval dependent may be reduced.
As an example, the predetermined order may be specified as desired as described above, for example, may be the most recent 1 st, the most recent 2 nd, the most recent 1 st, and so on. Table 3 below shows click time characteristics in the case where the preset number of recent clicks window is 128 most recent clicks, the predetermined order is 1 most recent and 1 most distant (i.e., 1 st and 128 th) as an example, and three content attributes (one content attribute of one primary classification, one content attribute of one secondary classification, one content attribute of one tag classification) exist.
TABLE 3 Table 3
/>
As shown in table 3, taking the content attribute of the first class classification as an example, in addition to the features shown in table 2 (not shown in table 3 for brevity), the click time feature may further include: content recommendation requests of the content file with the content attribute of the first class when the content file is clicked most recently 1 time are spaced from the request number of the current content recommendation requests; a combination feature of a request number interval of a content recommendation request and a current content recommendation request of a content file having the content attribute of the first class at the latest 1 click, and the content attribute; the content recommendation request of the content file with the content attribute of the first class in the farthest 1 click is spaced from the request number of the current content recommendation request; the content recommendation request of the content file with the content attribute of the first class at the farthest 1 click is separated from the request number of the current content recommendation request, and the combination characteristic of the content file with the content attribute.
In some embodiments, the historical behavior data further includes data related to historical presentation of content files to the user. In this case, the plurality of features may include a presentation time feature in addition to the click time feature. The presentation time feature is a series of features related to presentation time of the content file having each of the at least one content attribute to the user in a historical presentation of a preset recent period. The preset latest period may be set as desired, for example, may be set to the latest 3 days, which is not limitative of course. In this case, the click time feature may be, for example, a feature related to a presentation time at which the user presents to the user a content file having each of the at least one content attribute in a last 3-day historical presentation. The presentation time feature is used on the basis of the click time feature, so that the positive effect of clicking on the content recommendation and the negative effect of presenting but not clicking on the content recommendation can be fully utilized, the accuracy of the content recommendation is further improved, key indexes such as the click rate, the click quantity and the like of the recommended content are improved, and the experience of a user when watching the content is also enhanced.
In some embodiments, the presentation time feature comprises: a time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of a current content recommendation request; a time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of the current content recommendation request, and a combined characteristic of both the each content attribute.
As an example, the predetermined order may be specified as desired as described above, for example, may be the most recent 1 st, the most recent 2 nd, the most recent 1 st, and so on. Table 4 below shows presentation time characteristics in the case where the preset latest period is the latest 3 days, the predetermined order is the latest 1 and the farthest 1, respectively, and three content attributes (one content attribute of one primary classification, one content attribute of one secondary classification, one content attribute of one tag classification) exist.
TABLE 4 Table 4
As shown in table 4, taking the content attribute of the first class as an example, the presentation time feature may include: a time interval between a presentation time of the content file having the content attribute of the first class at the last 1 presentation in the last 3 days of presentation and a time of the current content recommendation request; a time interval between a presentation time of the content file having the one-level classified content attribute at the last 1 presentation in the last 3 days of presentation and a time of a current content recommendation request, and a combined feature of both the one-level classified content attribute; a time interval between a presentation time of the content file having the one-level classified content attribute at the farthest 1 presentation among the presentation of the last 3 days and a time of the current content recommendation request; the presentation time of the content file having the content attribute of the first class at the farthest 1 presentation in the presentation of the last 3 days is a time interval with the time of the current content recommendation request, and the combined characteristics of both the content attributes of the first class.
In some embodiments, presenting the temporal feature may further comprise: the content file with each content attribute is in a request number interval of a content recommendation request and a current content recommendation request when being presented in a preset sequence in the history presentation of the preset latest period; the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period at a request number interval of the content recommendation request and the current content recommendation request, and a combination feature of both the each content attribute.
As an example, the predetermined order may be specified as desired as described above, for example, may be the most recent 1 st, the most recent 2 nd, the most recent 1 st, and so on. Table 5 below shows presentation time characteristics in the case where the preset latest period is the latest 3 days, the predetermined order is the latest 1 and the farthest 1, respectively, and three content attributes (one content attribute of one primary classification, one content attribute of one secondary classification, one content attribute of one tag classification) exist.
TABLE 5
As shown in table 5, taking the content attribute of the first class as an example, the presentation time feature may include: the content file with the content attribute of the first class is in the interval of the request number of the content recommendation request and the current content recommendation request when the content file is presented for the last 1 times in the presentation of the last 3 days; the content file with the content attribute of the first class is characterized by the combination of the interval between the request number of the content recommendation request and the current content recommendation request and the content attribute of the first class, wherein the content recommendation request is located in the last 1 times of presentation in the last 3 days of presentation; the content file with the content attribute of the first class is in a request number interval of the content recommendation request and the current content recommendation request when the content file is presented for the furthest 1 time in the presentation of the last 3 days; the content file having the content attribute of the first class is characterized by a combination of a request number interval of the content recommendation request and the current content recommendation request, which is the most distant 1 presentation in the last 3 days of presentation, and the content attribute of the first class.
As an example, fig. 6 shows a schematic diagram of determining click time features and presentation time features according to one embodiment of the present disclosure. As shown in fig. 6, the current time for making the current content recommendation request is 18:00, denoted as the 0 th content recommendation request; the last click time of the content file with the content attribute of Nezha of label classification is 16:13, which is the last 4 th content recommendation request; the furthest click time is 13:07, which is the 16 th most recent content recommendation request. Therefore, the time interval between the last click and the current time is 1:47, and the request number interval is 4, wherein the time interval can be discretized in half an hour, namely 3; the time interval between the farthest 1 click time and the current time is 4:53, and the request number interval is 16, wherein the time interval can be discretized in units of half an hour, namely 9. The determination of the temporal characteristics presented in fig. 6 is similar and will not be described in detail here.
In some embodiments, when determining the feature values of the plurality of features for characterizing the user's interest in the each content file, the original value of each feature of the plurality of features may be first obtained, and then the feature value of each feature may be obtained based on the original value of the each feature and the corresponding feature name. The original value of each feature may be the value of the feature itself (i.e., the input value) or may be obtained by indexing the input value of the feature as described below. As indicated above, embodiments of the present disclosure relate to combining features and single features (i.e., features that are not combining features). For a single feature, there is only one input value, such as the discretized value of 3 for the time interval of the last click and the current time described above, the value of 4 for the request number interval, and so on. The input value typically has a uint64 (64-bit unsigned integer) type, float (floating point) type. The input values such as content attribute, request time interval, request number interval and the like are generally of the type uint64, and the original value of the single feature is the input value; the input value such as click rate is typically float feature, and the original value of the single feature is 10000.
For a combined feature, there are multiple input values since it is a combination of multiple single features. For example, for a content file having the content attribute of the first class among the click behavior features described above, the combination feature of the request number interval of the content recommendation request and the current content recommendation request, and the content attribute at the time of the latest 1 click has input values of two single features of the request number interval and the content attribute. At this time, the original values of the single features are obtained respectively and marked as x 1、x2, and then the original value y of the combined feature is obtained by prime number continuous multiplication, namely: y=x 1×13131+x2. Similarly, the way in which such original values are calculated can be extended to combined features with three or more input values. For example, for the case of three input values x 1、x2、x3, y=13131× (x 1×13131+x2)+x3.
As an example, when the feature value of each feature is obtained based on the original value of each feature and the corresponding feature name, the original value of each feature may be hashed to obtain the first hash value of each feature; hashing the character string of the feature name of each feature to obtain a second hash value of each feature; and obtaining the characteristic value of each characteristic based on the first hash value and the second hash value of each characteristic.
For example, to increase the distinguishability of features, while compromising on-line performance, embodiments of the present disclosure may map feature values to a 64-bit hash space. The high 16 bits of the 64-bit space are used to reflect feature type information, which is obtained by hashing (hash) a feature name string (feature_name), taking the low 16 bits and shifting them; then, the feature index information is reflected by using the lower 48 bits, and the original value (feature_value_o) of the feature is hashed, and the lower 48 bits are taken, that is, the feature value Y of the feature can be calculated according to the following formula:
Y=hash(feature_name)&0xFFFF<<48+hash(feature_value_o)&0xFFFFFFFFFFFF。
In step 204, a score for each content file is determined based at least on the feature values of the plurality of features. In some embodiments, at least the feature values of the plurality of features may be input into a trained intelligent scoring model to obtain the score for each content file. The score for each content file may be used to represent a predicted probability of clicking by the user on the each content file.
In practical applications, the trained intelligent scoring model used can be selected according to practical needs, for example: logistic regression (LR, logistic Regression) models, depth factorizer (DeepFM, deep Factorisation Machine) models, and the like. Taking the logistic regression model as an example for scoring, the feature values of the plurality of features are respectively input into a trained logistic regression model (classification model) to obtain the score of each content file. The formula used may be :z=w0+w1×x1+w2×x2+w3×x3+…+wn×xn; where xn is the nth eigenvalue of the content file, w n is the coefficient of x n, z is the score of the content file, z e 0, 1.
In practical applications, the employed trained intelligent scoring model may also be, for example, a deep learning model common in the art of artificial intelligence. The artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
As an example, the trained smart scoring model may be trained from positive sample data and negative sample data; wherein, in presenting the plurality of content files to the user, feature values of a plurality of features characterizing the user's interest in a clicked content file of the presented plurality of content files are taken as positive sample data, and feature values of a plurality of features characterizing the user's interest in a non-clicked content file of the presented plurality of content files are taken as negative sample data.
It should be noted that the score of each content file may be determined in addition to the feature values of the plurality of features, in combination with feature values of environmental features, user features, content features, and the like. The environmental characteristics characterize the properties associated with the content recommendation request, such as the requesting zone, requesting device, network used, etc. User characteristics mainly include demographic characteristics of the user, such as the user's gender, age, address, etc.; and long-term portrayal features of the user, such as the user's depth interests, etc. The content features mainly comprise topics, quality and the like of the content, and the data are generally marked when the content is put into a library; and also collections, reviews, etc. of content, such data is typically obtained off-line statistics after the content is presented.
At step 205, a predetermined number of content files from the plurality of content files are selected for recommendation to the user based on the scores of the plurality of content files. Additionally, the predetermined number of content files may be sent to a user's client for presentation.
In some embodiments, the selection of the predetermined number of content files may be performed by: based on the scores of the plurality of content files, sorting the plurality of content files according to the scores to obtain an ordered sequence of the content files; a predetermined number of content files are selected for recommendation to the user starting from a first content file in the ordered sequence of content files.
A score of 10 content files, a predetermined number of 3, is illustrated as an example. The scores of the 10 content files are between 0 and 1, the obtained 10 scores are ranked according to the order of the scores from high to low, such as 0.9, 0.85, 0.83, 0.8, 0.75, 0.7, 0.66, 0.64, 0.6 and 0.58, and content files corresponding to the scores of 0.9, 0.85 and 0.83 are selected for recommendation to the user.
In the above embodiment of the present disclosure, by using the click behavior feature related to the occurrence times of each content attribute of the user in the preset latest click number window during content recommendation, the short-term interest of the user can be fully reflected and the change of the interest of the user can be rapidly reflected during content recommendation, so that the accuracy of content recommendation can be greatly improved, and key indexes such as the click rate, the click volume, and the like of the recommended content can be improved. In particular, a preset number of recent clicks sub-window can optionally be utilized to achieve the effect of more finely characterizing the user's short-term interests and capturing changes in the user's interests more quickly. In addition, the click time feature and the presentation time feature can be utilized to reflect the time difference during content recommendation, so that the accuracy of content recommendation is further improved, and key indexes such as click rate, click quantity and the like of recommended content are improved.
Fig. 7 illustrates a schematic flow diagram of a method 700 for content recommendation according to one embodiment of the present disclosure. The method 700 may be implemented, for example, in an associated client on the terminal 120 or 130 shown in fig. 1. As shown in fig. 7, the method 700 may include the following steps.
In step 701, a predetermined number of content files for recommendation to a user is acquired. The predetermined number of content files is selected, for example, by the method 200 described with reference to fig. 2. In some embodiments, a content recommendation request may first be sent, for example, to a server, and then a predetermined number of content files sent, for example, by the server for recommendation to the user may be obtained.
At step 702, the predetermined number of content files is presented. This may be presented, for example, in an associated client on the terminal 120 or 130 shown in fig. 1. The predetermined number of content files may be presented in various ways, such as by video, audio, and so forth.
By using the method for recommending the content, disclosed by the embodiment of the invention, the recommended content can fully embody the short-term interests of the user and quickly embody the change of the interests of the user, so that the accuracy of content recommendation can be greatly improved, and key indexes such as click rate, click quantity and the like of the recommended content are improved.
Next, a method for content recommendation according to an embodiment of the present disclosure will be described by taking a point of view in which a content file is a video and a client on a terminal is a Tencer as an example. FIG. 8 is a schematic diagram of an architecture of a method for content recommendation according to an embodiment of the present disclosure, and referring to FIG. 8, the method for content recommendation according to an embodiment of the present invention mainly includes an offline portion and an online portion; the offline part is mainly used for calculating user portraits according to historical behavior data of users and training an intelligent scoring model, wherein the user portraits mainly comprise portraits with different dimensions such as primary classification, secondary classification and the like; the online portion mainly comprises recall of candidate videos, ranking scoring of videos, video diversity presentation, and the like.
For the offline portion, the user portrayal is a long-term accumulation of user interests, and may have a hierarchical structure, as shown in fig. 3, from the top layer down, a first class, a second class, and a label class. The intelligent scoring model may then be trained based on the historical behavior data of the user and the user image.
For the online portion, taking the "Kerater" content attribute in the tag class as an example, the "Kerater" related videos in the video library may be recalled as candidate videos. In practical implementation, when a user starts a recommendation service, the server may perform user portrait calculation on the user based on a user identifier carried in a request sent by the client, so as to recall related videos. The recalled videos may then be scored and ranked and a predetermined number of videos selected for recommendation to the user based on the scoring and ranking.
Fig. 9 is a schematic diagram of an architecture for ranking video provided by an embodiment of the present disclosure, mainly including a resource adaptation, feature extraction, and score ranking section. As shown in fig. 9, in the resource adaptation section, first, user portrayal adaptation and user behavior adaptation are performed, i.e., the user's historical behavior data and user portrayal are acquired as described above.
Feature extraction is described next. Feature extraction mainly involves three parts, namely feature design, feature index and feature coding. Feature design is primarily based on video data of the video to design various features that characterize the user's interest in the video to facilitate subsequent scoring. In particular, click behavior features, click time features, presentation time features, etc. may be designed as described above.
The features typically have one or more input values, as described above. Feature index the feature values are consistently indexed to obtain the original values of the features as described above, mainly for the convenience of computing the feature values. A single feature typically has one input value, which typically has a uint64 (64-bit unsigned integer) type, float (floating point) type. For a combined feature, there are multiple input values since it is a combination of multiple single features. At this time, the original values of the single features can be obtained respectively, and then the original values of the combined features can be obtained by adopting a prime number continuous multiplication mode. Similarly, the way in which such original values are calculated can be extended to combined features with three or more input values.
Feature encoding is mainly to encode the original value of a feature to obtain a feature value. To increase the distinguishability of features, while taking into account online performance, embodiments of the present disclosure may map feature values to a 64-bit hash space. The high 16 bits of the 64-bit space are used to reflect feature type information, which is obtained by hashing a feature name string (feature_name), taking the low 16 bits and shifting them; then, the feature index information is reflected by using the lower 48 bits, and the original value (feature_value_o) of the feature is hashed, and the lower 48 bits are taken, that is, the feature value Y of the feature can be calculated according to the following formula:
Y=hash(feature_name)&0xFFFF<<48+hash(feature_value_o)&0xFFFFFFFFFFFF。
next, feature values of the plurality of features extracted from the features may be input into a trained LR model as described above, and scores corresponding to the video may be calculated. In practice, the unordered _map container access parameter of stl (STANDARD TEMPLATE Library) may be used, but the searching time is too high, or the container access parameter of google dense_map may be used, so that the searching time is reduced by about 2/3. After the score of each video is obtained, ranking the videos based on the score so as to select a preset number of videos to recommend the videos based on the ranking result.
The flow of using the intelligent scoring model described above is further described below. The intelligent scoring model essentially uses CTR (Click-through rate) prediction to predict how much probability a content file is clicked by a user after recommendation, which is a very important link in an industrial-level recommendation system. FIG. 10 illustrates an overall flow of click rate estimation, which is primarily data, features, models, online, and the like, in accordance with an embodiment of the present disclosure. As shown in fig. 10. The data aspect mainly comprises the steps of obtaining original data, wherein the original data mainly comprises click logs and presentation logs of users; the feature aspect is mainly feature engineering, and mainly comprises three main categories of obtaining various user figures, content attributes and calculation of various features as described above; the model aspect is mainly an intelligent scoring model, which can be various linear models and nonlinear models; the on-line aspect is mainly an on-line service, involving extraction of features as described above, calculation of scores, and final ranking and recommendation. The invention is mainly aimed at the expansion of the characteristic engineering module.
Fig. 11 is a schematic diagram of a model training architecture provided in an embodiment of the present invention, and referring to fig. 11, the model training mainly includes three parts of log merging, feature extraction and model training, and will be described below.
Journal merging is essentially to aggregate all information of a content recommendation request together based on a click journal (holding data related to the user's historical click on the content file), a presentation journal (holding data related to the historical presentation of the content file to the user). Because clicks are relatively large relative to presentation latency, there is a time window problem. For example, a time window of 15 minutes may be adopted in the embodiment of the present invention, where one presented click is considered to occur within 15 minutes, and if the time is out, no click is considered. For each content file requested each time, find out whether it is clicked and the corresponding click and presentation data, the combined log data is written on Kafka, which is a distributed, partitioned, multi-copy, multi-subscriber distributed log system.
And extracting the characteristics according to the combined log data, and respectively extracting the characteristic values corresponding to the user of each content file clicked in the log to construct a positive sample and a negative sample of model training, wherein the characteristic values of a plurality of characteristics corresponding to the clicked content file in the plurality of content files presented to the user are used as positive sample data, and the characteristic values of a plurality of characteristics corresponding to the non-clicked content file in the plurality of content files presented are used as negative sample data. Features are extracted that rely on positive data of the content file (e.g., for querying content attributes), historical statistics of the user portraits and the content file (e.g., user query click history and presentation history, etc.), where the positive data, the historical statistics are updated hourly, and the user portraits are updated daily. In the embodiment of the invention, 99% of the sample data are randomly extracted as training samples, the remaining 1% are test samples, and the training samples and the test samples can be respectively written on two topics (topics) of kafka for use in model training.
The invention can use all training samples to train the model, and use an online learning sparse algorithm to train a large-scale sparse logistic regression model. The logistic regression model trained offline in the present invention can be derived once every 30 minutes and push the online environment.
Fig. 12 shows a schematic representation of presenting content files recommended by a method for content recommendation according to an embodiment of the present disclosure, for example, presented in a client on a user's terminal. As shown in fig. 12, since the content attribute of the soccer ball exists in the secondary classification of the user's user representation and the content hit number of the content attribute is the largest in the last 100 hits, the soccer ball-related video is recommended to the user in the main recommendation 1201. Alternatively, after the user clicks on the football related video, a one-to-three scene 1202 may be entered, presenting a series of videos related to the videos in the main recommendation 1201.
Fig. 13 illustrates an exemplary block diagram of an apparatus 1300 for content recommendation according to one embodiment of the present disclosure. As shown in fig. 13, the apparatus 1300 for content recommendation includes a first acquiring module 1301, a second acquiring module 1302, a first determining module 1303, a second determining module 1304, and a selecting module 1305.
The first acquisition module 1301 is configured to acquire historical behavior data of a user, which includes data related to historical clicks of the content file by the user, and a user portrayal, which includes a plurality of interest categories of the user, each interest category including content attributes of the content file. In some embodiments, each interest category may further include an interest level corresponding to a content attribute of the content file. In some embodiments, the first acquisition module 701 may be configured to acquire historical behavioral data of the user and a user representation in response to receiving a current content recommendation request for the user.
The second acquisition module 1302 is configured to acquire content data of a plurality of content files, the content data of each content file including at least one content attribute of said each content file. In some embodiments, the second acquisition module 1302 may be configured to acquire content data for a plurality of content files based on the user image.
The first determining module 1303 is configured to determine, based on the at least one content attribute of each content file, the user representation, and the historical behavior data, feature values of a plurality of features characterizing the user's interest in the each content file, the plurality of features including click behavior features related to a number of occurrences of each of the at least one content attribute in a history of the user within a preset number of recent clicks window. In some embodiments, the plurality of features may further include a click time feature related to a click time of the user's historical clicks on the content file having the each content attribute within the preset number of recent clicks window. In some embodiments, the historical behavior data further includes data related to a historical presentation of content files to the user, and the plurality of features further includes a presentation time feature related to a presentation time of the content files having each of the at least one content attribute to the user in a historical presentation of a preset recent period of time.
The second determination module 1304 is configured to determine a score for the each content file based at least on the feature values of the plurality of features. In some embodiments, the second determination module 1304 may be configured to input at least the feature values of the plurality of features into a trained intelligent scoring model to obtain the score for each content file. The score for each content file may be used to represent a predicted probability of clicking by the user on the each content file.
The selection module 1305 is configured to select a predetermined number of content files from the plurality of content files for recommendation to the user based on the scores of the plurality of content files.
FIG. 14 illustrates an exemplary block diagram of a device 1400 for content recommendation in accordance with one embodiment of the present disclosure. As shown in fig. 14, the apparatus 1400 for content recommendation includes a content file acquisition module 1401 and a presentation module 1402.
The file acquisition module 1401 is configured to acquire a predetermined number of content files for recommendation to a user from the apparatus 1300 for content recommendation. In some embodiments, the file acquisition module 1401 may first send a content recommendation request and then acquire a predetermined number of content files for recommendation to the user as a response to the content recommendation request.
The presentation module 1402 is configured to present the predetermined number of content files. As an example, the presentation module 1402 may be configured to present the predetermined number of content files in various manners, such as by video, audio, and so forth.
Fig. 15 illustrates an example system 1500 that includes an example computing device 1510 that represents one or more systems and/or devices that can implement the various techniques described herein. Computing device 1510 may be, for example, a server of a service provider, a device associated with a server, a system-on-chip, and/or any other suitable computing device or computing system. Any of the devices 1300 and 1400 for content recommendation described above with reference to fig. 13 and 14, respectively, may take the form of a computing device 1510. Alternatively, any one of the apparatus 1300 for content recommendation and the apparatus 1400 for content recommendation may be implemented as a computer program in the form of the content recommendation application 1516.
The example computing device 1510 as illustrated includes a processing system 1511, one or more computer-readable media 1512, and one or more I/O interfaces 1513 communicatively coupled to each other. Although not shown, the computing device 1510 may also include a system bus or other data and command transfer system that couples the various components to one another. A system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.
Processing system 1511 represents functionality that performs one or more operations using hardware. Thus, the processing system 1511 is illustrated as including hardware elements 1514 that may be configured as processors, functional blocks, and the like. This may include implementation in hardware as application specific integrated circuits or other logic devices formed using one or more semiconductors. The hardware element 1514 is not limited by the material from which it is formed or the processing mechanism employed therein. For example, the processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, the processor-executable instructions may be electronically-executable instructions.
The computer-readable medium 1512 is illustrated as including memory/storage 1515. Memory/storage 1515 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 1515 may include volatile media (such as Random Access Memory (RAM)) and/or nonvolatile media (such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth). Memory/storage 1515 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) and removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer readable medium 1512 may be configured in a variety of other ways as described further below.
One or more I/O interfaces 1513 represent functionality that allows a user to input commands and information to the computing device 1510 using various input devices, and optionally also allows information to be presented to the user and/or other components or devices using various output devices. Examples of input devices include keyboards, cursor control devices (e.g., mice), microphones (e.g., for voice input), scanners, touch functions (e.g., capacitive or other sensors configured to detect physical touches), cameras (e.g., motion that does not involve touches may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device (e.g., a display or projector), speakers, a printer, a network card, a haptic response device, and so forth. Accordingly, the computing device 1510 may be configured in a variety of ways as described further below to support user interaction.
The computing device 1510 also includes a content recommendation application 1516. The content recommendation application 1516 may be, for example, a software instance of either of the device 1300 for content recommendation and the device 1400 for content recommendation, and implement the techniques described herein in combination with other elements in the computing device 1510.
Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.
An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer readable media can include a variety of media that are accessible by computing device 1510. By way of example, and not limitation, computer readable media may comprise "computer readable storage media" and "computer readable signal media".
"Computer-readable storage medium" refers to a medium and/or device that can permanently store information and/or a tangible storage device, as opposed to a mere signal transmission, carrier wave, or signal itself. Thus, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in methods or techniques suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of a computer-readable storage medium may include, but are not limited to RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disk, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or articles of manufacture adapted to store the desired information and which may be accessed by a computer.
"Computer-readable signal media" refers to signal bearing media configured to hardware, such as send instructions to computing device 1510 via a network. Signal media may typically be embodied in computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, data signal, or other transport mechanism. Signal media also include any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
As previously described, the hardware elements 1514 and computer-readable media 1512 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that, in some embodiments, may be used to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or components of a system on a chip, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or other hardware devices. In this context, the hardware elements may be implemented as processing devices that perform program tasks defined by instructions, modules, and/or logic embodied by the hardware elements, as well as hardware devices that store instructions for execution, such as the previously described computer-readable storage media.
Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Thus, software, hardware, or program modules, and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer readable storage medium and/or by one or more hardware elements 1514. The computing device 1510 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, for example, by using the computer-readable storage medium of the processing system and/or the hardware elements 1514, a module may be implemented at least in part in hardware as a module executable by the computing device 1510 as software. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 1510 and/or processing systems 1511) to implement the techniques, modules, and examples described herein.
In various implementations, the computing device 1510 may take on a variety of different configurations. For example, the computing device 1510 may be implemented as a computer-like device including a personal computer, desktop computer, multi-screen computer, laptop computer, netbook, and the like. Computing device 1510 may also be implemented as a mobile appliance-like device including a mobile device such as a mobile phone, portable music player, portable gaming device, tablet computer, multi-screen computer, or the like. The computing device 1510 may also be implemented as a television-like device that includes devices having or connected to generally larger screens in casual viewing environments. Such devices include televisions, set-top boxes, gaming machines, and the like.
The techniques described herein may be supported by these various configurations of computing device 1510 and are not limited to the specific examples of techniques described herein. The functionality may also be implemented in whole or in part on the "cloud" 1520 using a distributed system, such as through the platform 1522 as described below.
The cloud 1520 includes and/or is representative of a platform 1522 for resources 1524. The platform 1522 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 1520. The resources 1524 may include applications and/or data that may be used when executing computer processing on servers remote from the computing device 1510. The resources 1524 may also include services provided over the internet and/or over subscriber networks such as cellular or Wi-Fi networks.
The platform 1522 may abstract resources and functionality to connect the computing device 1510 with other computing devices. The platform 1522 may also be used to abstract a hierarchy of resources to provide a corresponding level of hierarchy of encountered demand for resources 1524 implemented via the platform 1522. Thus, in an interconnect device embodiment, implementation of the functionality described herein may be distributed throughout system 1500. For example, the functionality may be implemented in part on the computing device 1510 and by the platform 1522 abstracting the functionality of the cloud 1520.
It should be understood that for clarity, embodiments of the present disclosure have been described with reference to different functional units. However, it will be apparent that the functionality of each functional unit may be implemented in a single unit, in a plurality of units or as part of other functional units without departing from the present disclosure. For example, functionality illustrated to be performed by a single unit may be performed by multiple different units. Thus, references to specific functional units are only to be seen as references to suitable units for providing the described functionality rather than indicative of a strict logical or physical structure or organization. Thus, the present disclosure may be implemented in a single unit or may be physically and functionally distributed between different units and circuits.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or sections, these devices, elements, components or sections should not be limited by these terms. These terms are only used to distinguish one device, element, component, or section from another device, element, component, or section.
The collection and processing of relevant data (such as historical behavior data and user images) in the present application should be strictly based on the requirements of relevant national laws and regulations when the example is applied, obtain the informed consent or independent consent of the personal information body, or have the necessary legal basis, and develop the subsequent data use and processing behavior within the authorized scope of laws and regulations and the personal information body. Although the present disclosure has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present disclosure is limited only by the appended claims. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The order of features in the claims does not imply any specific order in which the features must be worked. Furthermore, in the claims, the word "comprising" does not exclude other elements, and the word "a" or "an" does not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

Claims (13)

1. A method for content recommendation, comprising:
Acquiring historical behavior data of a user and a user portrait, wherein the historical behavior data comprises data related to historical clicks of the user on a content file, the user portrait comprises a plurality of interest categories of the user, and each interest category comprises content attributes of the content file;
Acquiring content data of a plurality of content files based on a user portrait of the user, wherein the content data of each content file comprises at least one content attribute of each content file;
Determining, based on the at least one content attribute of each content file, the user representation and historical behavior data, feature values for a plurality of features characterizing the user's interest in the each content file, the plurality of features including a click behavior feature, the click behavior feature being related to a number of occurrences of each of the at least one content attribute in a history of clicks of the user within a preset number of recent clicks window;
determining a score for each content file based at least on the feature values of the plurality of features;
selecting a predetermined number of content files from the plurality of content files for recommendation to the user based on the scores of the plurality of content files;
wherein the plurality of features further includes a click time feature associated with a click time of the user's historical clicks on the content file having the each content attribute within the preset number of recent clicks window.
2. The method of claim 1, wherein obtaining historical behavioral data of the user and the user representation comprises:
in response to receiving a current content recommendation request for a user, historical behavioral data of the user and a user portrait are obtained.
3. The method of claim 1, wherein each interest classification further comprises an interest level corresponding to a content attribute of the content file, and the click behavior feature comprises:
The method comprises the steps that each content attribute and the history of the user in each corresponding preset latest click number sub-window in at least one preset latest click number sub-window are combined with each corresponding characteristic of the occurrence number of each content attribute, wherein the at least one preset latest click number sub-window is a sub-window of the preset latest click number window; and
The ranking of the interestingness of each content attribute in its corresponding interest category, and the respective combination of the number of occurrences of each content attribute.
4. The method of claim 1, wherein the click time feature comprises:
a time interval between a click time when the content file having the each content attribute is clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request; and
A time interval between a click time of the content file having the each content attribute when clicked in a predetermined order within the preset latest click number window and a time of a current content recommendation request, and a combined characteristic of both the each content attribute.
5. The method of claim 4, wherein the click time feature further comprises:
the content file with each content attribute is located in a content recommendation request interval with the request number of the current content recommendation request when being clicked in a preset sequence in the preset latest click number window; and
The content file with each content attribute is located at a request number interval of the content recommendation request and the current content recommendation request when clicked in a predetermined order within the preset latest click number window, and the combination characteristic of both the content attributes.
6. The method of claim 1, wherein the historical behavior data further comprises data related to a historical presentation of content files to the user, and the plurality of features further comprises a presentation time feature related to a presentation time of content files having each of the at least one content attribute to the user in a historical presentation of a preset recent period of time.
7. The method of claim 6, wherein the presentation time feature comprises:
a time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of a current content recommendation request; and
A time interval between a presentation time at which the content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period and a time of the current content recommendation request, and a combined characteristic of both the each content attribute.
8. The method of claim 7, wherein presenting the temporal feature further comprises:
The content file with each content attribute is in a request number interval of a content recommendation request and a current content recommendation request when being presented in a preset sequence in the history presentation of the preset latest period; and
The content file having the each content attribute is presented in a predetermined order in the history presentation of the preset latest period at a request number interval of the content recommendation request and the current content recommendation request, and a combination feature of both the each content attribute.
9. The method of claim 1, wherein determining feature values for a plurality of features characterizing the user's interest in each content file based on at least one content attribute of the each content file, the user representation, and historical behavior data comprises:
acquiring an original value of each of the plurality of features;
And obtaining the characteristic value of each characteristic based on the original value of each characteristic and the corresponding characteristic name.
10. The method of claim 1, wherein determining the score for each content file based at least on the feature values of the plurality of features comprises:
Inputting at least the feature values of the plurality of features into a trained intelligent scoring model to obtain a score for each content file; the trained intelligent scoring model is obtained through training according to positive sample data and negative sample data; wherein, in presenting the plurality of content files to the user, feature values of a plurality of features characterizing the user's interest in a clicked content file of the presented plurality of content files are taken as positive sample data, and feature values of a plurality of features characterizing the user's interest in a non-clicked content file of the presented plurality of content files are taken as negative sample data.
11. An apparatus for content recommendation, comprising:
a first acquisition module configured to acquire historical behavior data of a user and a user representation, the historical behavior data including data related to historical clicks of a content file by the user, the user representation including a plurality of interest categories of the user, each interest category including content attributes of the content file;
A second acquisition module configured to acquire content data of a plurality of content files based on a user portrait of the user, the content data of each content file including at least one content attribute of the each content file;
A first determination module configured to determine, based on at least one content attribute of each content file, the user representation, and historical behavior data, feature values for a plurality of features characterizing the user's interest in the each content file, the plurality of features including click behavior features that relate to a number of occurrences of each of the at least one content attribute in a history of the user within a preset number of recent clicks window;
a second determination module configured to determine a score for each content file based at least on the feature values of the plurality of features;
A selection module configured to select a predetermined number of content files from the plurality of content files for recommendation to the user based on the scores of the plurality of content files;
wherein the plurality of features further includes a click time feature associated with a click time of the user's historical clicks on the content file having the each content attribute within the preset number of recent clicks window.
12. A computing device, comprising
A memory configured to store computer-executable instructions;
a processor configured to perform the method of any of claims 1-10 when the computer executable instructions are executed by the processor.
13. A computer readable storage medium storing computer executable instructions which, when executed, perform the method of any one of claims 1-10.
CN202010402143.0A 2020-05-13 2020-05-13 Method and apparatus for content recommendation Active CN111552884B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010402143.0A CN111552884B (en) 2020-05-13 2020-05-13 Method and apparatus for content recommendation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010402143.0A CN111552884B (en) 2020-05-13 2020-05-13 Method and apparatus for content recommendation

Publications (2)

Publication Number Publication Date
CN111552884A CN111552884A (en) 2020-08-18
CN111552884B true CN111552884B (en) 2024-05-14

Family

ID=72006290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010402143.0A Active CN111552884B (en) 2020-05-13 2020-05-13 Method and apparatus for content recommendation

Country Status (1)

Country Link
CN (1) CN111552884B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112230773B (en) * 2020-10-15 2021-06-22 同济大学 Intelligent scene pushing method and system for assisting enteroscopy and enteroscopy device
CN112650931B (en) * 2021-01-04 2023-05-30 杭州情咖网络技术有限公司 Content recommendation method
CN113111268A (en) * 2021-04-30 2021-07-13 百度在线网络技术(北京)有限公司 Training method of user feature extraction model, content recommendation method and device
CN114329201B (en) * 2021-12-27 2023-08-11 北京百度网讯科技有限公司 Training method of deep learning model, content recommendation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326413A (en) * 2016-08-23 2017-01-11 达而观信息科技(上海)有限公司 Personalized video recommending system and method
CN109829116A (en) * 2019-02-14 2019-05-31 北京达佳互联信息技术有限公司 A kind of content recommendation method, device, server and computer readable storage medium
CN110008375A (en) * 2019-03-22 2019-07-12 广州新视展投资咨询有限公司 Video is recommended to recall method and apparatus
CN110489639A (en) * 2019-07-15 2019-11-22 北京奇艺世纪科技有限公司 A kind of content recommendation method and device
CN110781321A (en) * 2019-08-28 2020-02-11 腾讯科技(深圳)有限公司 Multimedia content recommendation method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10977322B2 (en) * 2015-11-09 2021-04-13 WP Company, LLC Systems and methods for recommending temporally relevant news content using implicit feedback data
CN109087135B (en) * 2018-07-25 2020-08-28 百度在线网络技术(北京)有限公司 Mining method and device for user intention, computer equipment and readable medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326413A (en) * 2016-08-23 2017-01-11 达而观信息科技(上海)有限公司 Personalized video recommending system and method
CN109829116A (en) * 2019-02-14 2019-05-31 北京达佳互联信息技术有限公司 A kind of content recommendation method, device, server and computer readable storage medium
CN110008375A (en) * 2019-03-22 2019-07-12 广州新视展投资咨询有限公司 Video is recommended to recall method and apparatus
CN110489639A (en) * 2019-07-15 2019-11-22 北京奇艺世纪科技有限公司 A kind of content recommendation method and device
CN110781321A (en) * 2019-08-28 2020-02-11 腾讯科技(深圳)有限公司 Multimedia content recommendation method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Personal Recommendation Engine of User Behavior Pattern and Analysis on Social Networks;Cheng-Hung Tsai等;《2015 International Conference on Computational Science and Computational Intelligence (CSCI)》;20160303;全文 *
国内电子商务个性化推荐研究进展:核心技术;孙雨生;张晨;任洁;朱礼军;;现代情报(04);151-157 *
基于用户点击的线性回归在内容推荐中的应用研究;石方夏;;现代电子技术;20170901(17);全文 *

Also Published As

Publication number Publication date
CN111552884A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111552884B (en) Method and apparatus for content recommendation
US11263217B2 (en) Method of and system for determining user-specific proportions of content for recommendation
US11907240B2 (en) Method and system for presenting a search result in a search result card
CN109819284B (en) Short video recommendation method and device, computer equipment and storage medium
US10061820B2 (en) Generating a user-specific ranking model on a user electronic device
US20150262069A1 (en) Automatic topic and interest based content recommendation system for mobile devices
TWI636416B (en) Method and system for multi-phase ranking for content personalization
US8589434B2 (en) Recommendations based on topic clusters
JP5436665B2 (en) Classification of simultaneously selected images
US20190018900A1 (en) Method and Apparatus for Displaying Search Results
US11288333B2 (en) Method and system for estimating user-item interaction data based on stored interaction data by using multiple models
US11294911B2 (en) Methods and systems for client side search ranking improvements
US20160048754A1 (en) Classifying resources using a deep network
US10929409B2 (en) Identifying local experts for local search
US20140189525A1 (en) User behavior models based on source domain
CN107704560B (en) Information recommendation method, device and equipment
US9916384B2 (en) Related entities
US20200004827A1 (en) Generalized linear mixed models for generating recommendations
US10674215B2 (en) Method and system for determining a relevancy parameter for content item
US20130173568A1 (en) Method or system for identifying website link suggestions
US11507735B2 (en) Modifying a document content section of a document object of a graphical user interface (GUI)
CN112749296A (en) Video recommendation method and device, server and storage medium
CN108319622B (en) Media content recommendation method and device
US10445326B2 (en) Searching based on application usage
US11838597B1 (en) Systems and methods for content discovery by automatic organization of collections or rails

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027875

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant