WO2023078226A1 - 推荐方法、装置、服务器和计算机可读存储介质 - Google Patents

推荐方法、装置、服务器和计算机可读存储介质 Download PDF

Info

Publication number
WO2023078226A1
WO2023078226A1 PCT/CN2022/128878 CN2022128878W WO2023078226A1 WO 2023078226 A1 WO2023078226 A1 WO 2023078226A1 CN 2022128878 W CN2022128878 W CN 2022128878W WO 2023078226 A1 WO2023078226 A1 WO 2023078226A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
content
offline
online
data
Prior art date
Application number
PCT/CN2022/128878
Other languages
English (en)
French (fr)
Inventor
王林翰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023078226A1 publication Critical patent/WO2023078226A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the embodiments of the present application relate to the field of communication technologies, and in particular to a recommendation method, device, server, and computer-readable storage medium.
  • the personalized recommendation system has the problems of low calculation efficiency and slow recommendation response speed when faced with massive data.
  • An embodiment of the present application provides a recommendation method, including: when it is determined that the user is online, obtaining a user label constructed offline for the user from an offline database; performing tag matching on content tags constructed offline for each content to be recommended, and generating online a recommendation list for the user according to the matching result; and pushing the recommendation list to a terminal corresponding to the user.
  • the embodiment of the present application also provides a recommendation device, including: an offline user label acquisition module, configured to obtain a user label constructed offline for the user from an offline database when it is determined that the user is online; an online matching module, It is configured to perform tag matching according to the user tag and the offline content tag stored in the offline database for each content to be recommended, and generate a recommendation list for the user online according to the matching result; and an online push module , configured to push the recommendation list to the terminal corresponding to the user.
  • an offline user label acquisition module configured to obtain a user label constructed offline for the user from an offline database when it is determined that the user is online
  • an online matching module It is configured to perform tag matching according to the user tag and the offline content tag stored in the offline database for each content to be recommended, and generate a recommendation list for the user online according to the matching result
  • an online push module configured to push the recommendation list to the terminal corresponding to the user.
  • the embodiment of the present application also provides a server, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions that can be executed by the at least one processor , the instructions are executed by the at least one processor; enabling the at least one processor to execute the above-mentioned recommended method.
  • the embodiment of the present application further provides a computer-readable storage medium storing a computer program, and the computer program implements the above recommended method when executed by a processor.
  • Fig. 1 is the flowchart of the recommendation method according to the embodiment of the present application.
  • Fig. 2 is an architecture diagram of a system implementing a recommendation method according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of the composition of an application layer according to an embodiment of the present application.
  • Fig. 4 is a schematic diagram of online and offline two processes of a recommendation method according to an embodiment of the present application
  • Fig. 5 is a schematic structural diagram of a recommendation device according to an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
  • An embodiment of the present application provides a recommendation method applied to a server, where the server may be a server in a content delivery network (Content Delivery Network, CDN).
  • This embodiment is to provide recommended content for users in the CDN network, such as recommending content for users to watch after entering an electronic program guide (EPG) page, recommending content for users to watch after watching a movie, etc.
  • EPG electronic program guide
  • the recommendation method may include the following steps 101 to 103 .
  • step 101 when it is determined that the user is online, a user tag constructed offline for the user is acquired from an offline database.
  • step 102 tag matching is performed according to user tags and offline content tags stored in an offline database for each content to be recommended, and a recommendation list for the user is generated online according to the matching results.
  • step 103 a recommendation list is pushed to a terminal corresponding to the user.
  • the inventors of the present application found that the reason for the problems of low calculation efficiency and slow recommendation response speed in the current personalized recommendation system when faced with massive data is that most of them use online real-time recommendation, and the entire calculation process is completed online.
  • the whole process of the machine learning algorithm that requires a lot of computing power will be executed on the Internet.
  • the machine learning algorithm is slow to execute, and the shortcomings of large computing power requirements will also be inherited by the entire recommendation system, resulting in heavy pressure on online computing and slow recommendation response speed.
  • the recommendation method is divided into an offline part and an online part. User tags and content tags are constructed in the offline part, and tag matching and recommendation list push are performed in the online part, which is beneficial to reduce the online calculation amount of the server.
  • the embodiment of the present application is applicable to an application environment where the amount of computation is saved, and by splitting the offline part and the online part, computing power is saved and user recommendation response speed is accelerated.
  • the embodiment of the present application is also applicable to the application environment of improving user response. By splitting the offline and online parts, computing power can be saved and the response speed of user recommendation can be improved. In a recommendation process, the online part can only provide users and content to be recommended Matching, extremely low computation and extremely fast response.
  • the server may acquire a user tag constructed offline for the user from an offline database.
  • the user tags of a large number of users can be stored in the offline database, and the above-mentioned large number of users can include users who have watched audio and video online, so that when a user goes online, the server can obtain the user's user ID from the offline database.
  • the offline database can be a database in the server or a database outside the server. That is to say, the user tag can be constructed offline and stored in a preset database in the server, or stored in a preset database outside the server.
  • User tags can be used to represent feature data of different dimensions of the user, for example, the user's identity feature, viewing record feature, viewing preference feature, and the like. Since the server can directly obtain the user tag constructed offline for the user in the offline database without obtaining user data online, and construct the user tag online based on the user characteristics obtained online, the server can quickly obtain the user tag of the user.
  • the server may perform tag matching according to the user tag and the offline content tags stored in the offline database for each content to be recommended, and generate a recommendation list for the user online according to the matching result.
  • the server may store a large number of content tags of the content to be recommended, and the content to be recommended may include video, audio and other objects that can be watched or listened to by the user.
  • Content tags can be used to characterize feature data of content in different dimensions.
  • Content tags can be constructed offline by the server based on feature data of content to be recommended.
  • Content features can include but are not limited to one of the following or any combination thereof: content director, The actors in the content, the type of content, the rating of the content, the viewing group of the content, and the viewing time period of the content.
  • the label matching can be implemented as follows: the server performs regression classification on the user's user label and the content labels of the content to be recommended stored in the offline database through logistic regression (logistic regression, LR) technology, and obtains the user's information on each content. Satisfaction value of the content to be recommended, and generate a recommendation list for the user according to the level of user satisfaction with each content to be recommended. For example, the content to be recommended is sorted from high to low according to the user's satisfaction value, and the top N content to be recommended is used as a recommendation list for the user.
  • the value of N may be set according to actual needs, for example, it may be 6, 7, 8, etc. However, this embodiment of the present application does not specifically limit it.
  • the server may push the recommendation list to the terminal corresponding to the user.
  • the terminal corresponding to the user may be a terminal attached to the user online, and the terminal may be a mobile phone, a TV, a tablet computer, and the like.
  • the server can send the recommendation list generated for the user to the TV the user is watching, so that the user can watch the content recommended by the server.
  • the user label mentioned in step 101 is constructed offline in the following manner: performing offline feature engineering construction on the user's identity data and user viewing record data to obtain the user label; wherein, the offline feature engineering construction is at least Including one or any combination of the following: offline feature segmentation, offline feature mining, and offline feature combination.
  • offline feature segmentation, offline feature mining, offline feature combination, etc. it is beneficial to obtain accurate microscopic and comprehensive user tags, which is conducive to improving the accuracy of online matching and thus improving the accuracy of recommendation.
  • the user's identity data may include but not limited to one of the following or any combination thereof: the user's geographic location, the user's viewing device, the user's registered identity information, and the like.
  • the user's viewing record data may include but not limited to one of the following or any combination thereof: user's search record, user's click record, user's subscription record and other user's behavior record data.
  • performing offline feature engineering construction on the user's identity data and the user's viewing record data to obtain the user label may include: using the XGBOOST algorithm to perform offline feature engineering construction on the user's identity data and the user's viewing record data , so as to obtain precise and fine-grained user tags.
  • a collection of user tags of many users can form a user feature model, that is, in this embodiment, user tags can be abstracted from user identity data and user viewing record data, and a user feature model can be constructed by combining many user tags.
  • the recommendation method may further include: performing offline preprocessing on the user's identity data and the user's viewing record data, the offline The preprocessing may include: 1) integrating the user's viewing record data and user's identity data, so that the user's viewing record data matches the user's identity data, and obtains an integration that includes the user's viewing record data and user's identity data Data, the integrated data can be supplied to the algorithm system for training.
  • the content tag mentioned in step 102 is constructed offline in the following manner: the content data of the content to be recommended and the viewing record data of the content to be recommended are constructed offline feature engineering to obtain the content tag; wherein,
  • the offline feature engineering construction includes at least one of the following or any combination thereof: offline feature subdivision, offline feature mining, and offline feature combination.
  • the content data of the content to be recommended includes but not limited to one of the following or any combination thereof: the director of the content to be recommended, the actors in the content to be recommended, the type of the content to be recommended, and the rating of the content to be recommended.
  • the viewing record data of the content to be recommended includes, but is not limited to, one of the following or any combination thereof: the viewing group (for example, children, office workers, elderly people, etc.) of the content to be recommended, the time period when the content is viewed , the location being viewed, etc.
  • offline feature engineering construction is performed on the content data of the content to be recommended and the watched record data of the content to be recommended.
  • Offline feature engineering construction is performed on the viewing record data to obtain accurate and fine-grained content labels.
  • a collection of content tags of content to be recommended can form a content feature model, that is, in this embodiment, content tags can be abstracted from content data of content to be recommended and viewing record data of content to be recommended. The combination of many content tags Build a content feature model.
  • the recommendation method may further include: the content data of the content to be recommended and the watched record data of the content to be recommended Watch the recorded data for classification and cleaning, so that it can be provided as a data format for subsequent processing, avoid data errors, omissions, vacancies, errors, etc., and avoid contamination of the subsequent machine learning algorithm learning process, or contamination of computing data, resulting in model failures. Errors in training, errors in model training results, inability to execute recommended tasks, errors in execution of recommended tasks, etc.
  • the server may also obtain online user data of the user and content data of content watched by the user online.
  • the recommendation method further includes: updating the user label offline according to the user data obtained online; updating the content label according to the content data of the content watched by the user online Update offline.
  • the user data may not be static, so in this embodiment, according to the user data obtained online, the user label is updated offline, which is beneficial to obtain the latest user label that best matches the current characteristics of the user. Offline updates save computing power.
  • there are also data that may change in the content data of the content for example, the rating of the content may change with time. Therefore, in this embodiment, the content tags are updated offline according to the content data obtained online, which is beneficial to obtain While the latest content tags that best match the current content characteristics, computing power is saved through offline updates.
  • the above-mentioned offline construction or offline update can be understood as: the construction or update performed without a strong connection between the user and the server, or it can also be understood as: the construction or update performed by the server without networking. It is: the continuous work of non-immediate response (that is, no immediate response to user operations) performed by the server after obtaining relevant data.
  • the relevant data obtained by the server includes user identity data and user viewing record data
  • the relevant data obtained by the server includes the content of the content Viewing history data of data and content viewed.
  • the server obtains the user's identity data and viewing data online, and then, the server can build or update user tags offline based on the online obtained identity data and viewing record data.
  • the non-instant response continuous work performed by the server may be: after the user turns off the TV, the server still continues to construct or update the user tag offline according to the user's identity data and viewing record data.
  • the offline construction of the user label is performed when the server has not previously constructed a user label for the user, that is, the user's user label does not exist in the offline database; the offline update of the user label, It is performed when the server has previously constructed a user label for the user, that is, the user label of the user already exists in the offline database.
  • the time period for the user to watch TV is between 8:00 p.m. Offline construction or offline update of tags.
  • the duration of offline build or offline update can last until the next time the user turns on the TV. Since the user usually has a long time interval between turning on the TV twice, such as 1 day, the server has a long time Do offline builds or offline updates.
  • the server can have a longer Time to perform offline construction or offline update, so it can be completed by using a server with a lower cost, that is, the offline construction or offline update in the embodiment of the present invention can also achieve the effect of reducing costs.
  • each generation of new content and new users can trigger updating of user tags and content tags, thereby facilitating real-time training of user feature models and content feature models.
  • the server may start to acquire user data of the user and content data of content watched by the user online.
  • the server pushes the recommendation list to the user, it can start to obtain the user's user data and the content data of the content watched online by the user online, which is beneficial to provide enough computing space for the online push recommendation list, thereby further improving Recommended efficiency.
  • the terminal in step 103 is in the target scene, and the real-time requirement of the target scene for the recommendation is lower than the preset requirement, and the preset requirement can be set according to actual needs, that is, the user's real-time recommendation for the recommendation in the target scene
  • the target scenario may be a scenario that requires lower real-time recommendation than the Internet scenario. It is understandable that short video applications in Internet scenarios usually have higher requirements for real-time recommendation.
  • the target scene is the scene after the terminal enters the EPG page, the terminal can be an interactive network TV (Internet Protocol TV, IPTV), and the indexing and navigation of various services provided by IPTV are all completed through the EPG system of. Therefore, the target scene can be understood as a scene with IPTV, such as a living room scene.
  • IPTV Internet Protocol TV
  • the requirement for the update speed of the recommendation is low, which can also be understood as the requirement for the real-time performance of the recommendation is low, and the requirement for the response speed of the recommendation is relatively high. Therefore, for the recommendation in the living room scene, splitting the offline and online parts is beneficial to save computing power, and speed up the recommendation response speed without affecting the requirements for the recommendation update speed.
  • the user tag used when recommending the user is the user tag of the user stored in the offline database; the user turns on the IPTV in the living room on Tuesday, and the user tag used when recommending the user is
  • the offline updated user label is stored in the offline database using the user data of the user obtained online on Monday. That is to say, every time a user is online, there is no need to go through the time-consuming process of building user tags online, but directly obtain the already constructed or updated user tags stored in the offline database, which greatly improves Improve the efficiency of online recommendation.
  • Fig. 2 shows an architecture diagram of a system for implementing the above recommendation method according to an embodiment of the present invention.
  • the system includes: data layer, data preprocessing layer, application layer and master control layer.
  • the above-mentioned layers can be deployed in the overall network of the CDN main network, and have access rights to all user data of the CDN, and have access rights to all content data in the CDN. Except for the data layer, other layers can all run on the main server of the CDN, and the data layer can run on each group server of the CDN. The following will explain the functional architecture of these four levels:
  • the data layer is mainly responsible for obtaining various types of data, including but not limited to user data and content data.
  • This layer can exist in the CDN system as a physical network element (including but not limited to physical servers, Docker containers, etc.).
  • the data layer includes: a user's identity data acquisition module, a user's viewing record data acquisition module, and a content data acquisition module.
  • the user's identity data acquisition module is used to acquire the user's identity information including but not limited to the user's geographical location, the user's viewing device, the user's registration identity information, etc., and build a user identity label based on this.
  • the user's viewing record data acquisition module is used to obtain the user's behavior records including but not limited to the user's search record, user's click record, user's subscription record, etc., so as to construct the user's viewing preference label as the core.
  • the user tags mentioned in the above embodiments may include user identity tags and user viewing preference tags.
  • the content data acquisition module is used to acquire content-related information including but not limited to content providers, content directors, content actors, content classifications, content ratings, etc., and construct content tags based on this.
  • the data preprocessing layer is mainly responsible for classifying and cleaning all kinds of data acquired by the data layer, making it a data format that can be provided for application layer processing, avoiding errors, omissions, vacancies, errors, etc. in the data, and avoiding making the application layer machine
  • the learning process of the learning algorithm is polluted, or the calculation data is polluted, resulting in errors in model training, errors in model training results, inability to execute recommended tasks, and errors in the results of recommended tasks.
  • This layer exists in the CDN system as a physical network element (including but not limited to physical servers, Docker containers, etc.).
  • the data preprocessing layer includes: a user data preprocessing module and a content data preprocessing module.
  • the user data preprocessing module mainly includes two functions. 1) Integrate the user's identity data and the user's viewing record data, so that the user's viewing record data and identity data are matched, and an integrated data supply algorithm system including the user's identity data and viewing record data is obtained for training. 2) Classify and clean the user's viewing records and identity data, making them a data format that can be provided for application layer processing, avoiding data errors, omissions, vacancies, errors, etc., and avoiding the learning process of machine learning algorithms at the application layer Contamination, or contamination of computing data, leads to errors in model training, errors in model training results, inability to execute recommended tasks, and errors in the results of recommended tasks.
  • the content data preprocessing module is mainly used to classify and clean the content data acquired by the content data acquisition module, making it a data format that can be provided for application layer processing, avoiding data errors, omissions, vacancies, errors, etc.
  • the learning process of the machine learning algorithm at the application layer is polluted, or the calculation data is polluted, resulting in errors in model training, errors in model training results, inability to execute recommended tasks, and errors in the results of recommended tasks.
  • the application layer mainly includes: a label building module, used for label construction; a recommended content generation module, used for label matching and recommendation result generation.
  • the layering of the application layer may be as shown in FIG. 3 , including: an original data layer, a mining feature combination layer, a prediction scoring layer, and an output layer.
  • the process before LR label matching is an offline process
  • the process after LR label matching and thereafter is an online process.
  • Each level in Figure 3 is described below:
  • the original data layer mainly obtains relevant data from the user data preprocessing module and content data preprocessing module and performs feature engineering construction to obtain user tags and content tags.
  • the original data layer mainly includes three main types of data: the user's identity data (denoted as USER MAP), the user's viewing record data (denoted as LINK MAP) and the content data ITEM MAP of the content to be recommended.
  • USER MAP is mainly obtained from the identity data in the user data preprocessing module, including but not limited to the user's geographical location, user's viewing device, user's registration identity information and other user identity information, and based on this as the core preliminary construction Complete the User Identity tab.
  • LINK MAP is mainly obtained from the viewing record data in the user data preprocessing module, including but not limited to user search records, user click records, user subscription records and other user behavior records, so as to build the user's
  • the viewing preference tag enables the user's viewing preference tag to be associated with the user identity tag.
  • ITEM MAP is mainly obtained from the content data in the content data preprocessing module, including but not limited to content providers, content directors, content actors, content classification, content scoring and other content-related information, with this as the core Preliminary construction completes the content label.
  • Mining feature combination layer This layer mainly uses USER MAP, LINK MAP and ITEM MAP to carry out offline feature engineering construction through the XGBOOST algorithm, such as feature segmentation, feature mining, and automatic feature combination. content tab.
  • XGBOOST XGBOOST algorithm
  • XGBOOST feature subdivision, feature mining, and automatic feature combination of the content data of the content to be recommended and the viewing record data of the content to be recommended construct accurate and micro content tags.
  • the predictive scoring layer is used to obtain the user label constructed for the target user in the offline database when there is a target user watching online, and use LR technology to accurately and microscopically target the user label of the target user and the precise microcosm of the content to be recommended in the database Regression classification (that is, label matching) of the content tags of the content to predict the satisfaction value of the target user for each content to be recommended, and sort the satisfaction value of the content to be recommended by the user to obtain a recommendation list for the target user.
  • Regression classification that is, label matching
  • the output layer is used to push the recommendation list to the client used by the user.
  • the main control layer mainly includes the main control module, which is used to control the operation of the recommendation algorithm, distribute some hot content in full, and control the black and white lists of certain users through the main control layer.
  • This layer exists in the CDN system as a physical network element (including but not limited to physical servers, Docker containers, etc.).
  • the processes performed by the above-mentioned application layer are mainly divided into online and offline processes, and this layer exists in the CDN system as a physical network element (including but not limited to physical servers, DOCKER containers, etc.). Its flow chart is shown in Figure 4, and the following is a brief description of the example of the process:
  • the online process executes: generating a recommendation list based on the user; pushing the recommendation list to the user.
  • the server obtains the user tag constructed offline for the user from the offline database; performs tag matching according to the user tag and the content tags of each content to be recommended stored in the offline database, and generates an online tag for the user based on the matching result.
  • the recommendation list and push the recommendation list to the user.
  • the online process will also execute: obtain the user's identity data, the viewing record data of the user's online viewing, and the content data of the content that the user watches online;
  • Execute in offline process import data into offline database; update user label; update content label.
  • the data imported into the offline database includes the user’s identity data obtained in the online process, the viewing record data of the user’s online viewing, and the content data of the content watched by the user online; according to the user’s identity data imported into the offline database, the user’s online
  • the watched viewing record data can be used to update the user label, and the content label can be updated according to the content data of the content watched by the user online.
  • the offline + online mode is used to construct the recommendation algorithm system, and the recommendation list is generated online by constructing user tags and content tags to achieve a balance between computing speed and computing resources.
  • the pressure on online computing power is reduced, the rate of recommendation list generation is improved, and user experience is improved.
  • the recommendation list generation to the online part the recommended content is updated in real time to ensure the refresh rate of the recommended content and improve the user experience.
  • machine learning continues to iterate itself, and the feature extraction ability continues to strengthen itself, thereby improving the success rate of recommendation.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • An embodiment of the present application provides a recommendation device, as shown in FIG. 5 , including: an offline user label acquisition module 501, configured to acquire a user offline-constructed user ID for the user from an offline database when it is determined that the user is online. Tag; online matching module 502, configured to carry out tag matching according to the content tags stored in the user tag and offline database for offline construction of each content to be recommended, and generate a list of recommendations for users online according to the matching results; online push module 503 , configured to push the recommendation list to the terminal corresponding to the user.
  • an offline user label acquisition module 501 configured to acquire a user offline-constructed user ID for the user from an offline database when it is determined that the user is online.
  • online matching module 502 configured to carry out tag matching according to the content tags stored in the user tag and offline database for offline construction of each content to be recommended, and generate a list of recommendations for users online according to the matching results
  • online push module 503 configured to push the recommendation list to the terminal corresponding to the user.
  • each module involved in this embodiment is a logical module.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units combination is realized.
  • units that are not closely related to solving the technical problem proposed by the present invention are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • this embodiment is an apparatus embodiment corresponding to the above-mentioned method embodiment, and this embodiment can be implemented in cooperation with the above-mentioned method embodiment.
  • the relevant technical details and technical effects mentioned in the foregoing method embodiments are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied to the above method embodiments.
  • An embodiment of the present application provides a server, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to at least one processor 601; Instructions executed by the processor 601, the instructions are executed by at least one processor 601, so that the at least one processor 601 can execute the above recommended method.
  • the memory 602 and the processor 601 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 601 and various circuits of the memory 602 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 601 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 601 .
  • Processor 601 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management, and other control functions. And the memory 602 may be used to store data used by the processor 601 when performing operations.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above-mentioned method embodiments are realized.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种推荐方法、装置、服务器和计算机可读存储介质。上述推荐方法包括:在确定用户上线的情况下,从离线的数据库中获取针对所述用户离线构建的用户标签;根据所述用户标签和所述离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表;向所述用户对应的终端推送所述推荐列表。

Description

推荐方法、装置、服务器和计算机可读存储介质
相关申请的交叉引用
本申请要求于2021年11月4日提交的中国专利申请NO.202111302296.9的优先权,该中国专利申请的内容通过引用的方式整体合并于此。
技术领域
本申请实施例涉及通信技术领域,特别涉及一种推荐方法、装置、服务器和计算机可读存储介质。
背景技术
随着互联网的快速发展,信息爆炸己成为常态,各大视频推荐平台为增加用户粘性会对每个用户进行有针对性的个性化推荐,这对服务端处理数据的能力要求越来越高。在相关技术中,个性化推荐系统面对海量数据时存在计算效率低下、推荐响应速度慢的问题。
发明内容
本申请实施例提供了一种推荐方法,包括:在确定用户上线的情况下,从离线的数据库中获取针对所述用户离线构建的用户标签;根据所述用户标签和所述离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表;以及,向所述用户对应的终端推送所述推荐列表。
本申请实施例还提供了一种推荐装置,包括:离线用户标签获取模块,配置为在确定用户上线的情况下,从离线的数据库中获取针对所述用户离线构建的用户标签;在线匹配模块,配置为根据所述用户标签和所述离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表;以及,在线推送模块,配置为向所述用户对应的终端推送所述推荐列表。
本申请实施例还提供了一种服务器,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行;以使所述至少一个处理器能够执行上述的推荐方法。
为至少实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的推荐方法。
附图说明
图1是根据本申请实施例的推荐方法的流程图;
图2是根据本申请实施例的实现推荐方法的系统的架构图;
图3是根据本申请实施例的应用层的构成示意图;
图4是根据本申请实施例的推荐方法的在线和离线两个流程的示意图;
图5是根据本申请实施例的推荐装置的结构示意图;
图6是根据本申请实施例的服务器的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例及其具体实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例和实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例和实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例和实施方式的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例和/或实施方式在不矛盾的前提下可以相互结合相互引用。
本申请的一个实施例提供了一种推荐方法,应用于服务器,该服务器可以为内容分发网络(Content Delivery Network,CDN)中的服务器。本实施例是为了对于CDN网络中的用户提供推荐内容,如用户进入电子节目指南(electrical program guide,EPG)页面后推荐用户观看的内容,用户观看完成影片后推荐用户观看的内容,等。参考图1所示的该推荐方法的流程图,该推荐方法可包括以下步骤101至103。
在步骤101中,在确定用户上线的情况下,从离线的数据库中获取针对用户离线构建的用户标签。
在步骤102中,根据用户标签和离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表。
在步骤103中,向用户对应的终端推送推荐列表。
本申请的发明人发现,目前个性化推荐系统面对海量数据时出现的计算效率低下、推荐响应速度慢的问题的原因为:大多采用线上实时推荐,整个运算过程都在线上完成,因此线上会执行算力需求极大的机器学习算法全过程,机器学习算法执行速度慢,算力需求大的缺点也会被整个推荐系统继承,造成在线计算的压力大、推荐响应速度慢。基于此,本申请实施例中,将推荐方法分为离线部分和在线部分,在离线部分构建用户标签和内容标签,在线部分进行标签匹配和推荐列表的推送,有利于减小服务器的在线计算量,提升计算效率,加快推荐响应速度。本申请实施例适用于运算量节约的应用环境,通过离线和在线部分的拆分,节约算力,加快用户推荐响应速度。本申请实施例还适用于提高用户响应的应用环境,通过离线和在线部分的拆分,节约算力,提高用户推荐响应速度,在一次推荐的流程中在线部分可以只提供用户和待推荐内容的匹配,运算量极低,响应速度极快。
在步骤101中,服务器在确定用户上线观看音视频的情况下,可以从离线的数据库中获取针对该用户离线构建的用户标签。其中,离线的数据库中可以存储有大量用户的用户标签,上述大量用户可以包括曾经在线观看过音视频的用户,从而使得当某个用户上线后,服务器可以从离线的数据库中获取该用户的用户标签。离线的数据库可以为服务器中的数据库也可以为服务器之外的数据库。也就是说,用户标签可以离线构建,并存储在服务器中预设的数据库中,或者存储在服务器之外的预设的数据库中。用户标签可以用于表征用户的不同维度的特征数据,比如,用户的身份特征、观看记录特征、观看喜好特征等。由于服务器可以直接获取离线的数据库中针对该用户离线构建好的用户标签,而无需在线获取用户数据,基于 在线获取的用户特征再在线构建用户标签,因此服务器可以快速得到该用户的用户标签。
在步骤102中,服务器可以根据用户标签和离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对该用户的推荐列表。其中,服务器中可以存储大量待推荐内容的内容标签,待推荐内容可以包括视频、音频等可供用户观看或收听的对象。内容标签可以用于表征内容在不同维度的特征数据,内容标签可以为服务器根据待推荐内容的特征数据离线构建得到的,内容特征可以包括但不限于以下之一或其任意组合:内容的导演、内容中的演员,内容的类型、内容的评分、内容的收看群体、内容的收看时间段。
在一个实施方式中,标签匹配可以实现为:服务器通过逻辑回归(logistic regression,LR)技术对用户的用户标签和离线的数据库中存储的各待推荐内容的内容标签进行回归分类,获取用户对各待推荐内容的满意度值,并根据用户对各待推荐内容的满意度值的高低,生成对用户的推荐列表。比如,对各待推荐内容按照用户的满意度值进行从高到低的排序,将排序为前N个的待推荐内容作为对用户的推荐列表。其中,N的取值可以根据实际需要进行设置,比如可以为6、7、8等,然而,本申请实施例对此不做具体限定。
在步骤103中,服务器可以向用户对应的终端推送推荐列表。其中,用户对应的终端可以为用户在线所依附的终端,该终端可以为手机、电视机、平板电脑等。比如,用户在打开电视机联网观看后,服务器可以将针对该用户生成的推荐列表,发送至该用户所观看的电视机,从而使得该用户可以观看到服务器推荐的内容。
在一个实施方式中,步骤101中提到的用户标签通过以下方式离线构建得到:对用户的身份数据和用户的观看记录数据进行离线特征工程构造,以得到用户标签;其中,离线特征工程构造至少包括以下之一或其任意组合:离线特征细分、离线特征挖掘、离线特征组合。通过离线特征细分、离线特征挖掘、离线特征组合等,有利于得到精准微观且全面的用户标签,从而有利于提高在线匹配的精准度,进而提高推荐的精准度。
其中,用户的身份数据可以包括但不限于以下之一或其任意组合:用户的地理位置,用户的观看设备,用户的注册身份信息等。用户的观看记录数据可以包括但不限于以下之一或其任意组合:用户的搜索记录,用户的点击记录,用户的订阅记录等用户的行为记录数据。
在一个实施方式中,对用户的身份数据和用户的观看记录数据进行离线特征工程构造,以得到用户标签,可以包括:采用XGBOOST算法对用户的身份数据和用户的观看记录数据进行离线特征工程构造,以得到精准细化微观的用户标签。众多用户的用户标签的集合可以形成用户特征模型,即本实施方式中,可以从用户的身份数据和用户的观看记录数据中抽象出用户标签,由众多的用户标签的组合构建成用户特征模型。
在一个实施方式中,在对用户的身份数据和用户的观看记录数据进行离线特征工程构造之前,该推荐方法还可以包括:对用户的身份数据和用户的观看记录数据进行离线预处理,该离线预处理可以包括:1)对于用户的观看记录数据以及用户的身份数据进行整合,使得用户的观看记录数据与用户的身份数据匹配上,获得一个包含用户的观看记录数据以及用户的身份数据的整合数据,该整合数据可以供给算法系统进行训练。2)对于用户的观看记录数据以及用户的身份数据进行分类清洗(筛选),使其成为可以被提供用于后续处理的数据格式,避免数据出现错漏,空缺,错误等情况,避免使得后续的机器学习算法学习过程被污染,或者运算数据被污染,导致模型训练出错,模型训练结果出错,无法执行推荐任务,执行推荐 任务结果出错等情况。
在一个实施方式中,步骤102中提到的内容标签通过以下方式离线构建得到:对待推荐内容的内容数据和待推荐内容被观看的观看记录数据进行离线特征工程构造,以得到内容标签;其中,离线特征工程构造至少包括以下之一或其任意组合:离线特征细分、离线特征挖掘、离线特征组合。通过离线特征细分、离线特征挖掘、离线特征组合等,有利于得到精准微观且全面的内容标签,从而有利于提高在线匹配的精准度,进而提高推荐的精准度。
其中,待推荐内容的内容数据包括但不限于以下之一或其任意组合:待推荐内容的导演、待推荐内容中的演员,待推荐内容的类型、待推荐内容的评分。待推荐内容被观看的观看记录数据包括但不限于以下之一或其任意组合:待推荐内容被观看的观看群体(比如,儿童群体、上班族群体、老年人群体等)、被观看的时间段、被观看的地点等等。
在一个实施方式中,对待推荐内容的内容数据和待推荐内容被观看的观看记录数据进行离线特征工程构造,该推荐方法还可以包括:采用XGBOOST算法对待推荐内容的内容数据和待推荐内容被观看的观看记录数据进行离线特征工程构造,以得到精准细化微观的内容标签。众多待推荐内容的内容标签的集合可以形成内容特征模型,即本实施例中可以从待推荐内容的内容数据和待推荐内容被观看的观看记录数据中抽象出内容标签,众多的内容标签的组合构建成内容特征模型。
在一个实施方式中,在对待推荐内容的内容数据和待推荐内容被观看的观看记录数据进行离线特征工程构造之前,该推荐方法还可以包括:对待推荐内容的内容数据和待推荐内容被观看的观看记录数据进行分类清洗,使其可以成为被提供用于后续处理的数据格式,避免数据出现错漏,空缺,错误等情况,避免后续机器学习算法学习过程被污染,或者运算数据被污染,导致模型训练出错,模型训练结果出错,无法执行推荐任务,执行推荐任务结果出错等情况。
在一个实施方式中,在确定用户上线的情况下,服务器还可以在线获取用户的用户数据和用户在线观看的内容的内容数据。此外,在上述步骤103向用户对应的终端发送推荐列表之后,该推荐方法还包括:根据在线获取的用户数据,对用户标签进行离线更新;根据用户在线观看的内容的内容数据,对内容标签进行离线更新。可以理解的是,用户数据可能并不是一成不变的,因此本实施方式中根据在线获取的用户数据,对用户标签进行离线更新,有利于在得到最符合用户当前特征的最新的用户标签的同时,通过离线更新节约算力。类似的,内容的内容数据中也存在可能会变化的数据,比如内容的评分可能会随时间变化,因此,本实施方式中根据在线获取的内容数据,对内容标签进行离线更新,有利于在得到最符合当前内容特征的最新的内容标签的同时,通过离线更新节约算力。
上述的离线构建或离线更新可以理解为:在用户和服务器之间没有强连接的情况下进行的构建或更新,也可以理解为:服务器在不联网的情况下进行的构建或更新,还可以理解为:服务器在得到相关数据后,进行的非即时响应(即无需即时响应用户操作)的持续性工作。其中,在离线构建或更新用户标签的情况下,服务器得到的相关数据包括用户的身份数据和用户的观看记录数据;在离线构建或更新内容标签的情况下,服务器得到的相关数据包括内容的内容数据和内容被观看的观看记录数据。
下面以服务器离线构建或离线更新用户标签为例进行说明:
比如,用户今天在线看电视的过程中,服务器在线获取了用户的身份数据和用户的观看 数据,然后,服务器可以根据在线获取的身份数据和观看记录数据,离线构建或更新用户标签。服务器进行的非即时响应的持续性工作,可以为:服务器在用户关了电视之后,依然持续根据用户的身份数据和观看记录数据离线构建或更新用户标签。可以理解的是,用户标签的离线构建是在服务器之前并未对该用户进行过用户标签构建的情况下进行的,即离线的数据库中还不存在该用户的用户标签;用户标签的离线更新,是在服务器之前对该用户进行过用户标签构建的情况下进行的,即离线的数据库中已经存在该用户的用户标签。
假设,用户看电视的时间段为晚上8点到10点之间,则服务器在8点到10点之间获取了用户的身份数据和观看数据后,可以在后面的任意一个时间点开始进行用户标签的离线构建或离线更新。离线构建或离线更新的持续时间可以持续到用户下次打开电视机的时间点,由于,用户通常在两次打开电视之间会间隔较长时间,比如1天,因此,服务器有较长的时间进行离线构建或离线更新。由于在线构建标签或更新标签,通常对时间的要求较高,比如需要在1秒之内构建或更新成功,所以需要的服务器的成本较高,而本发明实施例中,服务器可以有较长的时间进行离线构建或离线更新,因此可以采用成本较低的服务器完成,即本发明实施例中的离线构建或离线更新还可以达到降低成本的效果。
在一个实施方式中,每次新内容和新用户的产生均可以触发用户标签的更新和内容标签的更新,从而有利于实现实时训练用户特征模型和内容特征模型。
在一个实施方式中,服务器可以在确定用户上线后,就开始在线获取用户的用户数据和用户在线观看的内容的内容数据。在另一个实施方式中,服务器可以在给用户推送推荐列表之后再开始在线获取用户的用户数据和用户在线观看的内容的内容数据,有利于给在线推送推荐列表提供足够的计算空间,从而进一步提高推荐效率。
在一个实施方式中,步骤103中的终端处于目标场景下,目标场景对推荐的实时性要求低于预设要求,预设要求可以根据实际需要进行设置,即目标场景下用户对推荐的实时性要求较低,比如,目标场景可以为对推荐的实时性要求低于互联网场景的场景。可以理解的是互联网场景的短视频应用对于推荐实时性的要求通常较高。通过在目标场景下执行本申请实施方式中的推荐方法,有利于在目标场景下提高推荐效率的同时,不影响用户在目标场景下对实时性的要求。
在一个实施方式中,目标场景为终端进入EPG页面后的场景,终端可以为交互式网络电视(Internet Protocol TV,IPTV),IPTV所提供的各种业务的索引及导航都是通过EPG系统来完成的。因此,目标场景可以理解为具有IPTV的场景,比如客厅场景。本实施方式中考虑到,在客厅场景中,对于推荐更新速度的要求较低,也可以理解为对于推荐的实时性要求较低,而对于推荐响应速度的要求较高。因此,对于客厅场景下的推荐,通过离线和在线部分的拆分,有利于节约算力,在不影响对于推荐更新速度的要求的情况下,加快推荐响应速度。比如,用户周一在客厅打开IPTV,对用户进行推荐时所利用的用户标签为离线的数据库中存储的该用户的用户标签,用户周二在客厅打开IPTV,对用户进行推荐时所利用的用户标签为离线的数据库中存储的利用周一在线获取的该用户的用户数据离线更新过的用户标签。也就是说,每次用户在线时,可以无需针对用户进行耗时较长的在线构建用户标签的流程,而是直接获取离线的数据库中存储的已经构建好或更新过的用户标签,极大的提高了在线推荐的效率。
图2示出了根据本发明实施例的实现上述推荐方法的系统的架构图。如图2所示,所述 系统包括:数据层,数据预处理层,应用层和主控层。上述几个层级可以布设于CDN主网络的整体网络中,对于CDN的所有用户数据具有获取权限,对于CDN中所有内容数据具有获取权限。除数据层外,其他层级可以均运行在CDN主服务器中,数据层可以运行在CDN各分组服务器。以下将对这四个层级的功能架构进行展开说明:
数据层主要负责获取各类数据,包括但不限于用户数据和内容数据。该层可以作为一个实体网元(包括但不限于实体服务器,DOCKER容器等)存在于CDN系统中。数据层中包括:用户的身份数据获取模块、用户的观看记录数据获取模块、内容数据获取模块。
用户的身份数据获取模块,用于获取包括但不限于用户的地理位置,用户的观看设备,用户的注册身份信息等用户的身份信息,以此为核心构建用户身份标签。
用户的观看记录数据获取模块,用于获取包括但不限于用户的搜索记录,用户的点击记录,用户的订阅记录等用户的行为记录,以此来为核心构建用户的观看喜好标签。上述实施例中提到的用户标签可以包括用户身份标签和用户的观看喜好标签。
内容数据获取模块,用于获取包括但不限于内容的提供商,内容的导演,内容的演员,内容的分类,内容的评分等内容相关信息,以此为核心构建内容标签。
数据预处理层主要负责对于数据层获取的各类数据进行分类清洗,使其成为可以被提供用于应用层处理的数据格式,避免数据出现错漏,空缺,错误等情况,避免使得应用层的机器学习算法学习过程被污染,或者运算数据被污染,导致模型训练出错,模型训练结果出错,无法执行推荐任务,执行推荐任务结果出错等情况。该层作为一个实体网元(包括但不限于实体服务器,DOCKER容器等)存在于CDN系统中。数据预处理层中包括:用户数据预处理模块、内容数据预处理模块。
用户数据预处理模块,主要包括两个功能。1)对于用户的身份数据以及用户的观看记录数据进行整合,使得用户的观看记录数据与身份数据匹配上,获得一个包含用户的身份数据以及观看记录数据的整合数据供给算法系统进行训练。2)对于用户的观看记录以及身份数据进行分类清洗,使其成为可以被提供用于应用层处理的数据格式,避免数据出现错漏、空缺、错误等情况,避免使得应用层的机器学习算法学习过程被污染,或者运算数据被污染,导致模型训练出错,模型训练结果出错,无法执行推荐任务,执行推荐任务结果出错等情况。
内容数据预处理模块,主要用于对内容数据获取模块获取的内容数据进行分类清洗,使其成为可以被提供用于应用层处理的数据格式,避免数据出现错漏、空缺、错误等情况,避免使得应用层的机器学习算法学习过程被污染,或者运算数据被污染,导致模型训练出错,模型训练结果出错,无法执行推荐任务,执行推荐任务结果出错等情况。
应用层,主要包括:标签构建模块,用于标签构建;推荐内容生成模块,用于标签匹配以及推荐结果的生成。
在一个实施方式中,应用层的分层可以如图3所示,包括:原始数据层、挖掘特征组合层、预测评分层、输出层。图3中,LR标签匹配之前的流程为离线流程,LR标签匹配以及其之后的流程为在线流程。下面对图3中的各个层级进行说明:
原始数据层,主要从用户数据预处理模块和内容数据预处理模块中获取相关的数据并进行特征工程构造,从而得到用户标签和内容标签。其中,原始数据层主要包含三类主要数据:用户的身份数据(记为USER MAP)用户的观看记录数据(记为LINK MAP)以及待推荐内容的内容数据ITEM MAP。其中,USER MAP主要由用户数据预处理模块中的身份数据获 取而来,包括但不限于用户的地理位置,用户的观看设备,用户的注册身份信息等用户的身份信息,以此为核心初步构建完成用户身份标签。LINK MAP主要由用户数据预处理模块中的观看记录数据获取而来,包括但不限于用户的搜索记录,用户的点击记录,用户的订阅记录等用户的行为记录,以此来为核心构建用户的观看喜好标签,使得用户的观看喜好标签能够与用户身份标签产生关联。ITEM MAP主要由内容数据预处理模块中的内容数据获取而来,包括但不限于内容的提供商,内容的导演,内容的演员,内容的分类,内容的评分等内容相关信息,以此为核心初步构建完成内容标签。
挖掘特征组合层:该层主要利用USER MAP,LINK MAP和ITEM MAP通过XGBOOST算法进行离线特征工程构造,比如进行特征细分,特征挖掘,以及自动特征组合,构建完成精准细化微观新用户标签和内容标签。对于单一用户的身份数据USER MAP以及观看记录数据LINK MAP进行自动信息挖掘和特征组合,构建用户的精准微观的用户标签。通过对待推荐内容的内容数据以及待推荐内容被观看的观看记录数据的XGBOOST特征细分,特征挖掘,以及自动特征组合,构建精准微观的内容标签。
预测评分层,用于当有目标用户在线观看时,获取离线的数据库中针对该目标用户构建的用户标签,通过LR技术对于目标用户的精准微观的用户标签以及数据库中各待推荐内容的精准微观的内容标签进行回归分类(即标签匹配),预测目标用户对各待推荐内容的满意度值,并通过用户对各待推荐内容的满意度值排序,得到针对该目标用户的推荐列表。
输出层,用于将该推荐列表推送给用户使用的客户端。
主控层:主控层主要包括主控模块,用于对于推荐算法的运行控制,对于某些热点内容的全量下发,对于某些用户的黑白名单等,通过主控层进行统一调控。该层作为一个实体网元(包括但不限于实体服务器,DOCKER容器等)存在于CDN系统中。
在一个实施例中,上述的应用层所执行的流程主要分为在线和离线两个流程,该层作为一个实体网元(包括但不限于实体服务器,DOCKER容器等)存在于CDN系统中。其流程图如4所示,下面对该流程的实例进行简单说明:
当服务器检测到用户在线的情况下,在线流程中执行:基于用户生成推荐列表;将该推荐列表推送给用户。具体的,服务器从离线的数据库中获取针对该用户离线构建的用户标签;根据用户标签和离线的数据库中存储的各待推荐内容的内容标签,进行标签匹配,并根据匹配结果在线生成对该用户的推荐列表,将该推荐列表出推送给用户。
当服务器检测到用户在线的情况下,在线流程中还会执行:获取用户的身份数据、用户在线观看的观看记录数据、用户在线观看的内容的内容数据;
离线流程中执行:将数据导入离线的数据库;更新用户标签;更新内容标签。其中,导入离线的数据库中的数据包括在线流程中获取的用户的身份数据、用户在线观看的观看记录数据、用户在线观看的内容的内容数据;根据导入离线的数据库中用户的身份数据、用户在线观看的观看记录数据,可以进行用户标签的更新,根据用户在线观看的内容的内容数据,可以进行内容标签的更新。
本实施例中,采用离线+在线模式构建推荐算法系统,通过构建用户标签、内容标签,在线生成推荐列表,达到运算速率和计算资源的平衡。通过将构建用户标签、内容标签的机器学习布置到离线部分,降低在线算力压力,提高推荐列表生成速率,提升用户体验。通过将推荐列表生成布置到在线部分,实时更新推荐内容,确保推荐内容刷新速度,提升用户体验。 系统运行过程中机器学习持续自我迭代,特征抽取能力持续自我加强,从而可以提高推荐成功率。
需要说明的是,本申请实施例中的上述各示例均为为方便理解进行的举例说明,并不对本发明的技术方案构成限定。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请的一个实施例提供了一种推荐装置,如图5所示,包括:离线用户标签获取模块501,配置为在确定用户上线的情况下,从离线的数据库中获取针对用户离线构建的用户标签;在线匹配模块502,配置为根据用户标签和离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对用户的推荐列表;在线推送模块503,配置为向用户对应的终端推送推荐列表。
值得一提的是,本实施例所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本发明的创新部分,本实施例中并没有将与解决本发明所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
不难发现,本实施例为与上述方法实施例相对应的装置实施例,本实施例可与上述方法实施例互相配合实施。上述方法实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述方法实施例中。
本申请的一个实施例提供了一种服务器,如图6所示,包括:至少一个处理器601;以及,与至少一个处理器601通信连接的存储器602;其中,存储器602存储有可被至少一个处理器601执行的指令,指令被至少一个处理器601执行,以使至少一个处理器601能够执行上述推荐方法。
其中,存储器602和处理器601采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器601和存储器602的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器601处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器601。
处理器601负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器602可以被用于存储处理器601在执行操作时所使用的数据。
本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序。计算机程序被处 理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施方式是实现本发明的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本发明的精神和范围。

Claims (11)

  1. 一种推荐方法,包括:
    在确定用户上线的情况下,从离线的数据库中获取针对所述用户离线构建的用户标签;
    根据所述用户标签和所述离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表;以及
    向所述用户对应的终端推送所述推荐列表。
  2. 根据权利要求1所述的推荐方法,其中,
    在确定用户上线的情况下,所述方法还包括:
    在线获取所述用户的用户数据和所述用户在线观看的内容的内容数据;并且
    在所述向所述用户对应的终端发送所述推荐列表之后,所述方法还包括:
    根据在线获取的所述用户数据,对所述用户标签进行离线更新;以及
    根据所述用户在线观看的内容的内容数据,对所述内容标签进行离线更新。
  3. 根据权利要求1所述的推荐方法,其中,所述用户标签通过以下方式离线构建得到:
    对所述用户的身份数据和所述用户的观看记录数据进行离线特征工程构造,以得到所述用户标签;
    其中,所述离线特征工程构造至少包括以下之一或其任意组合:离线特征细分、离线特征挖掘、离线特征组合。
  4. 根据权利要求1所述的推荐方法,其中,所述内容标签通过以下方式离线构建得到:
    对所述待推荐内容的内容数据和所述待推荐内容被观看的观看记录数据进行离线特征工程构造,以得到所述内容标签;其中,所述离线特征工程构造至少包括以下之一或其任意组合:离线特征细分、离线特征挖掘、离线特征组合。
  5. 根据权利要求2所述的推荐方法,其中,在线获取的所述用户数据包括所述用户的身份数据和所述用户在线观看的观看记录数据。
  6. 根据权利要求2所述的推荐方法,其中,所述用户在线观看的内容的内容数据包括以下任意组合:
    所述内容的导演、所述内容中的演员,所述内容的类型、所述内容的评分、所述内容的收看群体、所述内容的收看时间段。
  7. 根据权利要求1至6任一项所述的推荐方法,其中,所述终端处于目标场景下,所述目标场景对推荐的实时性要求低于预设要求。
  8. 根据权利要求7所述的推荐方法,其中,所述目标场景为所述终端进入电子节目指南EPG页面后的场景。
  9. 一种推荐装置,包括:
    离线用户标签获取模块,配置为在确定用户上线的情况下,从离线的数据库中获取针对所述用户离线构建的用户标签;
    在线匹配模块,配置为根据所述用户标签和所述离线的数据库中存储的针对各待推荐内容离线构建的内容标签,进行标签匹配,并根据匹配结果在线生成对所述用户的推荐列表;以及
    在线推送模块,配置为向所述用户对应的终端推送所述推荐列表。
  10. 一种服务器,包括:至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行;以使所述至少一个处理器能够执行如权利要求1至8中任一所述的推荐方法。
  11. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至8中任一所述的推荐方法。
PCT/CN2022/128878 2021-11-04 2022-11-01 推荐方法、装置、服务器和计算机可读存储介质 WO2023078226A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111302296.9A CN116070014A (zh) 2021-11-04 2021-11-04 推荐方法、装置、服务器和计算机可读存储介质
CN202111302296.9 2021-11-04

Publications (1)

Publication Number Publication Date
WO2023078226A1 true WO2023078226A1 (zh) 2023-05-11

Family

ID=86175726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/128878 WO2023078226A1 (zh) 2021-11-04 2022-11-01 推荐方法、装置、服务器和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN116070014A (zh)
WO (1) WO2023078226A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060123448A1 (en) * 2004-12-02 2006-06-08 Matsushita Electric Industrial Co., Ltd. Programming guide content collection and recommendation system for viewing on a portable device
CN108734510A (zh) * 2018-04-23 2018-11-02 微梦创科网络科技(中国)有限公司 基于属性匹配的广告推荐方法及系统
CN110533515A (zh) * 2019-09-04 2019-12-03 深圳创新奇智科技有限公司 一种高吞吐低延迟的电商个性化推荐方法及装置
CN112597395A (zh) * 2020-12-28 2021-04-02 上海众源网络有限公司 对象推荐方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060123448A1 (en) * 2004-12-02 2006-06-08 Matsushita Electric Industrial Co., Ltd. Programming guide content collection and recommendation system for viewing on a portable device
CN108734510A (zh) * 2018-04-23 2018-11-02 微梦创科网络科技(中国)有限公司 基于属性匹配的广告推荐方法及系统
CN110533515A (zh) * 2019-09-04 2019-12-03 深圳创新奇智科技有限公司 一种高吞吐低延迟的电商个性化推荐方法及装置
CN112597395A (zh) * 2020-12-28 2021-04-02 上海众源网络有限公司 对象推荐方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN116070014A (zh) 2023-05-05

Similar Documents

Publication Publication Date Title
AU2016277657B2 (en) Methods and systems for identifying media assets
US10277696B2 (en) Method and system for processing data used by creative users to create media content
US20190370096A1 (en) Distributed Processing in a Messaging Platform
US8732737B1 (en) Geographic context weighted content recommendation
US9292622B2 (en) Systems and methods for providing search suggestions
US20150178265A1 (en) Content Recommendation System using a Neural Network Language Model
US20130226878A1 (en) Seamless context transfers for mobile applications
US20170155939A1 (en) Method and System for Processing Data Used By Creative Users to Create Media Content
CN111552884A (zh) 用于内容推荐的方法和设备
CN115329131A (zh) 素材标签推荐方法、装置、电子设备及存储介质
CN111930927B (zh) 评价信息展示方法、装置、电子设备及可读存储介质
CN112287799A (zh) 基于人工智能的视频处理方法、装置及电子设备
KR20210143608A (ko) 컴퓨팅 장치 및 그 동작 방법
WO2023078226A1 (zh) 推荐方法、装置、服务器和计算机可读存储介质
KR20210097432A (ko) 콘텐츠를 추천하기 위한 장치 및 방법
WO2022228139A1 (zh) 视频展示方法、装置、计算机可读介质及电子设备
US20230107935A1 (en) User interfaces for refining video group packages
CN113761272A (zh) 一种数据处理方法、设备以及计算机可读存储介质
CN112104910A (zh) 一种视频搜索方法、装置及系统
CN114615524B (zh) 服务器、媒资推荐网络的训练方法及媒资推荐方法
US11997168B2 (en) Connecting devices for communication sessions
CN112908319B (zh) 一种处理信息交互的方法及设备
US20220311828A1 (en) Connecting devices for communication sessions
US20230244717A1 (en) Methods and apparatuses for preventing spoilers in autocompleted search queries
CN115086279A (zh) 一种多媒体内容的推送方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889251

Country of ref document: EP

Kind code of ref document: A1