CN110290199B - Content pushing method, device and equipment - Google Patents

Content pushing method, device and equipment Download PDF

Info

Publication number
CN110290199B
CN110290199B CN201910545055.3A CN201910545055A CN110290199B CN 110290199 B CN110290199 B CN 110290199B CN 201910545055 A CN201910545055 A CN 201910545055A CN 110290199 B CN110290199 B CN 110290199B
Authority
CN
China
Prior art keywords
content
popularity
candidate
contents
candidate content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910545055.3A
Other languages
Chinese (zh)
Other versions
CN110290199A (en
Inventor
胡文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910545055.3A priority Critical patent/CN110290199B/en
Publication of CN110290199A publication Critical patent/CN110290199A/en
Application granted granted Critical
Publication of CN110290199B publication Critical patent/CN110290199B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a content pushing method, a content pushing device and content pushing equipment. The content pushing method is applied to a management server of a content distribution network, and candidate content is determined from the content of the content distribution network; the candidate content is content which is not pushed to the edge IDC node; for each candidate content, extracting the characteristics of the candidate content, and inputting the characteristics of the candidate content into a preset model to obtain the popularity of the candidate content; selecting candidate contents with the popularity meeting a preset popularity condition from the candidate contents as popular contents, and pushing the popular contents to each edge IDC node in a preset pushing time period; the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets the preset bandwidth idle condition. By the scheme, popular content can be pushed in an idle period so as to effectively reduce bandwidth occupation in a peak period.

Description

Content pushing method, device and equipment
Technical Field
The present invention relates to the field of content push technologies, and in particular, to a content push method, device and apparatus.
Background
Because a CDN (Content Delivery Network) can implement the effects of reducing Network congestion and reducing Network delay by deploying edge IDC (Internet Data Center) nodes that provide services to users nearby and managing the functions of Content Delivery, scheduling, load balancing, and the like of a server, the CDN is widely used to provide large-scale Content services. In a particular application, the bandwidth usage of a content distribution network may be divided into peak periods and idle periods, subject to user access habits. The management server feeds back a large amount of contents to the IDC nodes, the bandwidth occupation is high, the user access amount in the idle period of the bandwidth is relatively small, the amount of the contents fed back to the IDC nodes by the management server is small, and the bandwidth occupation is relatively small.
In order to directly feed back the content received in the bandwidth idle period to the user by the edge IDC node in the peak period, thereby reducing bandwidth occupation and feedback delay caused by excessive content fed back in the peak period, the management server can push the content to the edge IDC node in the idle period in advance. However, the content of the large-scale content service is often massive, and if any content is selected for content push in an idle period in advance, a large amount of unnecessary content which a user does not request in a peak period may be pushed, so that the peak period management server still needs to feed back a large amount of content to the IDC node, and the bandwidth occupation in the peak period cannot be effectively reduced. Therefore, popular content with relatively high popularity of the user can be pushed in the idle period, so that the pre-pushing of unnecessary content is reduced, and the bandwidth occupation in the peak period is effectively reduced.
Therefore, how to push popular content in the idle period to effectively reduce the bandwidth occupation in the peak period is an urgent problem to be solved in content push.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a content push method, apparatus, and device, so as to push popular content in an idle period, so as to effectively reduce bandwidth occupation in a peak period. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a content pushing method, which is applied to a management server of a content distribution network, and the method includes:
determining candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
for each candidate content, extracting the characteristics of the candidate content, and inputting the characteristics of the candidate content into a preset model to obtain the popularity of the candidate content; the preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content;
selecting candidate contents with the popularity meeting a preset popularity condition from the candidate contents as popular contents, and pushing the popular contents to each edge IDC node in a preset pushing time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
In a second aspect, an embodiment of the present invention provides a content pushing apparatus, which is applied to a management server of a content distribution network, and includes:
a candidate content determination module for determining candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
the popularity determining module is used for extracting the characteristics of the candidate contents aiming at each candidate content and inputting the characteristics of the candidate contents into a preset model to obtain the popularity of the candidate contents; the preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content;
the popular content determining module is used for selecting candidate content with the popularity meeting a preset popular condition from the candidate content to serve as popular content, and pushing the popular content to each edge IDC node in a preset pushing time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the bus; a memory for storing a computer program; and a processor, configured to execute the program stored in the memory, and implement the steps of the content push method provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the push method provided in the first aspect.
In the scheme provided by the embodiment of the invention, because the sample content is the content pushed to the edge IDC node, the popularity of the sample content can be ensured to be determined, and therefore, the preset model obtained by utilizing the characteristics of a plurality of sample contents in advance can be used for obtaining the popularity of the content. Moreover, the sample content and the candidate content have common characteristics, so that the preset model can be used for acquiring the popularity of the candidate content, and selecting the popular content from the candidate content by using the acquired popularity; and, the candidate content is content that has not been pushed to the edge IDC node. Therefore, the popular content is pushed to each edge IDC node in the preset pushing time period, compared with the pushing of any content, the pushed content in the idle period is ensured to be the popular content which is relatively more likely to be requested by the user, and the quantity of unnecessary content in the pushed content and the bandwidth occupation in the peak period are reduced. Therefore, by the scheme, the popular content can be pushed in the idle period, so that the bandwidth occupation effect in the peak period is effectively reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a schematic flow chart of a content push method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a content push method according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a content pushing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a content pushing apparatus according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention; .
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be described below with reference to the drawings in the embodiment of the present invention.
The content push method provided by the embodiment of the present invention may be applied to a management server of a content distribution network, where the management server may include a desktop computer, a portable computer, an internet television, an intelligent mobile terminal, a server, and the like, and is not limited herein, and any electronic device that can implement the embodiment of the present invention belongs to the protection scope of the embodiment of the present invention.
In a specific application, the content distribution network may specifically be a video playing system, an instant messaging system, and the like, and accordingly, the content provided by the content distribution network to the client may specifically be a video, a news, a picture, an article, and the like.
As shown in fig. 1, a flow of a content pushing method according to an embodiment of the present invention may include the following steps:
s101, candidate contents are determined from the contents of the content distribution network. Wherein the candidate content is content which is not pushed to the edge IDC node.
In a specific application, the timing of pushing the content to the edge IDC node by the management server is various. Illustratively, the management server uniformly pushes the same content to each edge IDC node; or, the management server pushes the requested content to any edge IDC node after receiving the request of the edge IDC node. Accordingly, the source of the candidate content may be varied. Illustratively, there may be content that has not been pushed to each edge IDC node and/or content that has not been pushed to at least one edge IDC node. And, the management server selects popular content from the candidate content and pushes the popular content to each edge IDC node at the subsequent step S103, so that the popular content can be received and stored for the edge IDC node storing the popular content.
And the content which is not pushed to the edge IDC node is determined as the candidate content, so that the popular content is the content which needs to be pushed to the edge IDC node by the management server and can occupy the bandwidth between the management server and the edge IDC node, but not the content which is pushed and can not occupy the bandwidth between the management server and the edge IDC node, and the popular content is pushed in an idle period in advance, so that the occupation of the bandwidth in a peak period can be effectively reduced.
In addition, the push form of the candidate content may be various. Illustratively, the entire candidate content may be pushed directly, e.g., a complete video, a complete text, etc. Or, for example, the candidate content may be segmented according to a preset segmentation rule to obtain a plurality of segmented contents; and pushing the obtained fragment content. Correspondingly, the subsequent feature extraction, popularity prediction and pushing can be carried out aiming at the fragment content. It can be understood that feature extraction, popularity prediction, and push on piece content are similar to whole content, except that the object of processing is piece content obtained by dividing whole content. The preset segmentation rule may be segmentation according to a specified data size, or may be segmentation according to a specified playing time length for video and audio. For example, a 40Mb episode of a television series is divided into 2Mb pieces of content each; alternatively, a 40-minute episode of a television series is divided into pieces of content each 5 minutes long.
S102, extracting the characteristics of the candidate contents according to the candidate contents, and inputting the characteristics of the candidate contents into a preset model to obtain the popularity of the candidate contents. The preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content.
The feature extraction of the candidate content may be various. For example, the content containing the image may be subjected to image feature extraction, for example, the content such as video, pictures, and news may be subjected to image feature extraction. Alternatively, for example, semantic feature extraction may be performed on the candidate content, for example, semantic features expressed by text data in content such as videos, pictures, and news are extracted. For the convenience of understanding and reasonable layout, the following embodiment of fig. 2 of the present invention specifically describes a manner related to semantic feature extraction. Any feature extraction capable of extracting features of candidate contents can be used in the present invention, and this embodiment does not limit this.
The preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content; therefore, when training of the preset model is completed using the sample content, the preset model has a level at which the popularity of the candidate content can be predicted from the features of the candidate content. In a specific application, the model used for training to obtain the preset model may be any machine learning model. For example, the sensor may be a multi-layer sensor, or may be XGBoost (Gradient Boosting algorithm), etc. Moreover, the training mode for obtaining the preset model can be various. The method may be a supervised learning method or an unsupervised learning method. Of course, in the supervised learning approach, popularity labels for sample content used for training need to be labeled. Any machine learning model capable of training to obtain a preset model for predicting popularity and corresponding training mode can be used in the present invention, which is not limited in this embodiment.
Wherein the popularity may be varied. Illustratively, popularity may be a numerical value that characterizes popularity, with larger numbers indicating higher popularity, e.g., 0.9, 0.5, and 0.3, etc. Or, for example, popularity may be a popularity category that reflects popularity, e.g., may be four types "very popular", "ordinary", and "unpopular"; alternatively, there may be two types, "popular" and "unpopular", and so on. For ease of understanding and reasonable layout, the following description will specifically describe the popularity category as the popularity in the form of an alternative embodiment.
S103, selecting candidate contents with the popularity meeting the preset popularity condition from the candidate contents as popular contents, and pushing the popular contents to each edge IDC node in a preset pushing time period. The preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets the preset bandwidth idle condition.
The preset prevailing conditions may be various. For example, when the popularity is a numerical value representing the popularity, the preset popularity condition may be that the popularity is greater than or equal to a preset popularity threshold. Or, for example, when the popularity is the popularity category, the preset popularity condition may be the popularity category with the highest reflected popularity. Any preset popularity condition that can be used to select popular content can be used in the present invention, and the present embodiment does not limit this.
In order to push content which may be pushed in a peak period in advance in an idle period, the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition. For example, the preset bandwidth idle condition may be a time period when the bandwidth occupation amount of the management server to the edge IDC node is less than a preset bandwidth occupation threshold. For example, the preset push time period may be from 00:00 to 04:00 a.m. or from 01:00 to 06:00 a.m. in the morning. The preset push time period may be set according to the bandwidth occupation amount of the content distribution network, and any time period with relatively less bandwidth occupation may be used as the preset push time period in the present invention, which is not limited in this embodiment. In addition, after receiving the push notification of the pushed popular content, the edge IDC node can determine whether the edge IDC node stores the popular content, so that the prestored popular content is received and stored, and the stored popular content is not accepted.
In the scheme provided by the embodiment of the invention, because the sample content is the content pushed to the edge IDC node, the popularity of the sample content can be ensured to be determined, and therefore, the preset model obtained by utilizing the characteristics of a plurality of sample contents in advance can be used for obtaining the popularity of the content. Moreover, the sample content and the candidate content have common characteristics, so that the preset model can be used for acquiring the popularity of the candidate content, and selecting the popular content from the candidate content by using the acquired popularity; and, the candidate content is content that has not been pushed to the edge IDC node. Therefore, the popular content is pushed to each edge IDC node in the preset pushing time period, compared with the pushing of any content, the pushed content in the idle period is ensured to be the popular content which is relatively more likely to be requested by the user, and the quantity of unnecessary content in the pushed content and the bandwidth occupation in the peak period are reduced. Therefore, by the scheme, the popular content can be pushed in the idle period, so that the bandwidth occupation effect in the peak period is effectively reduced.
Optionally, the popularity includes: a popularity category divided by popularity of the content;
accordingly, the plurality of sample contents can be obtained by the following steps:
aiming at each content pushed to the edge IDC node, acquiring the feedback times of the content to the user by the content distribution network;
determining the popularity category of each content pushed to the edge IDC node by using the feedback times according to the corresponding relation between the preset popularity category and the feedback times;
and dividing each content pushed to the edge IDC nodes according to the determined popularity category to obtain a plurality of sample contents.
In a specific application, the popularity categories divided by the popularity of the content may be used as the popularity, and thus, when popular content is subsequently selected according to the popularity, the candidate content of the popularity category with the highest popularity reflected may be directly selected as the popular content. Compared with the method that the numerical value representing the popularity is used as the popularity, the method can save the step of comparing the popularity with the preset popularity threshold value, and improve the prediction efficiency. The popularity category may be various. Illustratively, there may be four types, "very popular", "ordinary" and "unpopular"; alternatively, there may be two types, "popular" and "unpopular", and so on.
When the popularity category is taken as the popularity, acquiring the popularity of the candidate content is equivalent to classifying the candidate content according to the popularity category. Therefore, in order to ensure that the preset model can determine the popularity categories of the candidate contents, each content pushed to the edge IDC node needs to be divided according to the popularity categories, so that the preset model is trained by using sample contents of each popularity category.
Specifically, the popularity categories of the contents pushed to the edge IDC nodes may be determined by using the feedback times according to a preset correspondence between the popularity categories and the feedback times. Wherein, the feedback times can be obtained from a feedback record for feeding back the content to the user. For example, when a user accesses an edge IDC node of a content distribution network to request content, the edge IDC node records a user access log. The user access log may contain at least a unique identifier of the requested content and an access time. Therefore, a user access log recorded by each edge IDC node can be obtained, so that the feedback times of the content corresponding to different identifiers are counted. For example, when the current time is 5 months and 30 days in 2019, user access logs from 1 month and 1 day in 2019 to 4 months and 30 days in 2019 can be acquired, and the number of times of feedback of the content pushed to the edge IDC node is counted.
In addition, the preset popularity category and the corresponding relation of the feedback times can be various. For example, the popularity category of the content ranked at the top% 5 may be determined as "very popular", the popularity category of the content ranked at the top 5% to the top 20% may be determined as "popular", the popularity category of the content ranked at the top 20% to the top 50% may be determined as "ordinary", and the popularity category of the remaining content may be determined as "unpopular" in order of the number of feedback times. Alternatively, the popularity category of the content ranked in the top% 20 may be determined as "popular" and the popularity category of the remaining content may be determined as "unpopular" in descending order of the number of feedback times. Accordingly, the popularity category of the candidate content subsequently obtained by using the preset model is consistent with the popularity category of the sample content, and when the popularity categories are divided according to the first exemplary description, the candidate content with the popularity category of "very popular" may be selected as the popular content, and when the popularity categories are divided according to the second exemplary description, the candidate content with the popularity category of "popular" may be selected as the popular content.
Optionally, before obtaining a plurality of sample contents after dividing each content pushed to the edge IDC node according to the determined popularity category, the content pushing method provided in the embodiment of the present invention may further include the following steps:
counting the number of contents of each popularity category;
comparing the counted number and taking the maximum number as a standard number;
for each popularity category, determining whether the number of contents of the popularity category is equal to a standard number;
if the number of the contents is equal to the standard number, the contents of the popularity category are used as sample contents, otherwise, the contents of the popularity category are supplemented to the standard number and used as the sample contents.
In specific application, the number of contents of different popularity categories is often uneven, and if the divided contents are directly used as a plurality of sample contents for training to obtain a preset model after each content pushed to the edge IDC node is divided, overfitting caused by uneven number of samples is likely to be caused. For this reason, after the contents pushed to the edge IDC node are divided, the number of the contents of the popularity category is adjusted for each popularity category, and the contents of the popularity category are supplemented to be equal to the standard number so as to ensure that each popularity category has the same number of contents, and then the contents of the popularity category are used as sample contents, so that overfitting is reduced by the uniform number of sample contents.
Wherein, for each popularity category, when the number of contents of the popularity category is not equal to the standard number, the manner of supplementing the contents of the popularity category to be equal to the standard number may be various. Illustratively, the content of the popularity category may be copied as supplemental content until the number of content of the popularity category equals the standard number. And/or, for example, a specified number of contents of the popularity category can be selected from other contents pushed historically except for the counted contents, wherein the specified number is the difference between the counted number of contents of the popularity and the standard number.
Illustratively, the number of contents with a popularity category of "very popular" is 50, and the number of contents with a popularity category of "popular" is 100. Therefore, the existing content with the popularity category of "very popular" can be copied, the "very popular" content can be supplemented, and the 100 "very popular" contents obtained can be used as the "very popular" sample content. Or, when the current time is 2019, 6 and 30 days, the counted 50 "very popular" contents are the contents pushed from 2019, 1 and 1 days in 2019, 4 and 30 days in 2019, and the contents pushed from 2019, 5 and 1 days in 2019, 5 and 31 days in 2019 and/or the contents pushed before 2019, 1 and 1 days in 2019 can be selected from the 50 "very popular" contents in the popularity category and supplemented to the counted 50 "very popular" contents, and the obtained 100 "very popular" contents are taken as "very popular" sample contents.
As shown in fig. 2, a flow of a content pushing method according to an embodiment of the present invention may include the following steps:
s201, determines candidate content from the content of the content distribution network.
S201 is the same as S101 in the embodiment of fig. 1, and is not repeated herein, for details, see the description of the embodiment of fig. 1.
S202, character data of each candidate content is obtained. The text data is data capable of indicating semantic layer information of candidate contents.
In a specific application, the text data of the candidate content may be various. Illustratively, the content may be at least one of title, profile, genre, series, author, and so on of the candidate content. For example, at least one of title, introduction, type of the series, the series to which the series belongs, and name of the authoring member of a certain episode; at least one of title, profile, music type, album to which the music belongs, name of the authoring member, and the like of a certain music; at least one of title, introduction, style of a certain picture, album to which the picture belongs, name of a creator, and the like. Of course, for each candidate content, if a plurality of text data of the candidate content are acquired, all the acquired text data may be spliced as text data used in the subsequent step S203. For example, the spliced "name of a drama authoring member to which the title profile type belongs" is taken as the character data used in step S203.
Furthermore, the acquisition method of the character data of each candidate content may be various. For example, when the text data of each content is stored in a database about the content maintained by the content distribution network, the text data of each candidate content can be directly searched from the database. Or, for example, the text data may be read from each candidate content.
Any text data capable of indicating the semantic layer information of the candidate content and the manner of acquiring the text data can be used in the present invention, which is not limited in this embodiment.
In addition, when the pushed candidate content is a fragmented content obtained by dividing a complete content, the text data of the content is usually described for the complete content, such as the introduction of a certain movie, the description of the main creator, and the movie type. Therefore, the text data of the complete content corresponding to the segment content needs to be used as the text data of the segment content. Specifically, for each piece of segment content, a complete content identifier corresponding to the piece of segment content may be searched from a preset correspondence between a piece of segment content identifier and a complete content identifier; and then aiming at each fragment content, searching the character data with the complete content identification corresponding to the fragment content from the pre-stored character data to be used as the character data of the fragment content. For example, the Identifier URI (Uniform Resource Identifier) of the fragmented content corresponds to the Identifier Tvid of the complete content, and for the fragmented content URI1, the found Identifier of the complete content is Tv1, and the text data of Tv1 may be used as the text data of the fragmented content URI 1.
S203, aiming at each candidate content, extracting semantic features of the candidate content based on the character data of the candidate content. Semantic features are features that can reflect the meaning expressed by textual data.
For each candidate content, the specific way of extracting the semantic features of the candidate content may be multiple based on the text data of the candidate content. For example, when the semantic features are probability distribution features of words included in the text data, the text data of the candidate content may be processed by using a probability model to obtain distribution probabilities of the respective participles in the text data as the semantic features. The probability model may specifically be a binary independent probability model, a probabilistic network information model, and the like.
Alternatively, for example, when the semantic features are images of words included in the text data, the text data may be processed into an image form, resulting in semantic features reflected in the image form. Specifically, the semantic features of words in the text data can be reflected in an image form by using a suffix tree model, a frequent word set hypergraph model, a graph control model and the like.
Or, for example, when the semantic features are feature vectors of words included in the text data, the text data may be segmented and the obtained words may be vectorized. For ease of understanding and reasonable layout, the case of semantic features as feature vectors is described in detail below in an alternative embodiment.
Any method capable of extracting semantic features of candidate contents based on the text data of the candidate contents can be used in the present invention, which is not limited in this embodiment.
And S204, inputting the characteristics of the candidate content into a preset model to obtain the popularity of the candidate content. The preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content.
The above S204 is similar to the step of obtaining popularity by using the feature of the candidate content in S102 in the embodiment of fig. 1 of the present invention, except that the feature of the candidate content in S204 is a semantic feature, and for the same parts, details are not repeated here, and see the description of the embodiment of fig. 1 of the present invention for details.
Similar to the semantic features for extracting the candidate content, when the preset model is obtained through training, the text data of the sample content may be obtained, and then the semantic features of the sample content are extracted based on the text data of the sample content, which is not described herein again, and is described in detail in steps S202 to S203.
S205, selecting candidate contents with the popularity meeting the preset popularity condition from the candidate contents as popular contents, and pushing the popular contents to each edge IDC node in the preset pushing time period.
S205 is the same as S103 in the embodiment of fig. 1, and is not repeated herein, for details, see the description of the embodiment of fig. 1.
In a specific application, text data for indicating at least one of a plurality of information such as the subject, the brief introduction and the author of the content exists in any form of content, and the information indicated by the text data can determine whether the content is popular or not; the image features in the content may play an aesthetic role and may not necessarily determine whether the content is popular. Therefore, compared with the method for extracting image features to obtain popularity, the embodiment of fig. 2 of the present invention extracts semantic features of candidate content based on the text data of the candidate content, so as to obtain popularity, which is applicable to content in as many forms as possible relatively, and can expand the application range of the content push method. Moreover, compared with the image data which needs to be processed for extracting the image features, especially for the video data with a large number of image frames, the text data which needs to be processed for extracting the semantic features relatively occupies less storage resources and processing resources, and the processing efficiency and the processing cost can be improved.
Optionally, the extracting semantic features of the candidate content based on the text data of the candidate content for each candidate content specifically includes the following steps:
for each candidate content, performing word segmentation on the character data of the candidate content to obtain a plurality of words contained in the character data of the candidate content;
aiming at each candidate content, searching a plurality of semantic feature vectors corresponding to a plurality of words of the candidate content from a preset corpus;
and adding a plurality of semantic feature vectors of the candidate content aiming at each candidate content, and performing normalization calculation to obtain the semantic features of the candidate content.
The specific way of segmenting the word data of the candidate content may be various. For example, word segmentation may be performed based on character string matching, and when a word matching the characters in the character data is found in a preset dictionary, the word is identified. The preset dictionary can be an open source dictionary or an own dictionary maintained aiming at the content of the content distribution network, and it can be understood that the own dictionary can be updated and expanded aiming at the content of the content distribution network in time, so that the accuracy of word segmentation and the subsequently determined semantic feature vector is improved. Alternatively, for example, word segmentation may be performed based on statistics, the frequency of combinations of characters appearing adjacent to each other in the character data is counted, and when the frequency is higher than a preset frequency threshold, the combination of the characters is determined as a word. Of course, in a specific application, the word segmentation can be implemented by using an existing chinese word segmentation tool, for example, a word segmentation tool jieba word segmentation and ancient word segmentation, etc.
Any word segmentation method capable of obtaining a plurality of words included in the text data of the candidate content can be used in the present invention, and this embodiment does not limit this.
Moreover, for each candidate content, multiple semantic feature vectors corresponding to multiple words of the candidate content may be searched from a preset corpus. For example, one word may correspond to one semantic feature vector, or one word may correspond to a plurality of semantic feature vectors. When a word corresponds to a plurality of semantic feature vectors, the plurality of semantic feature vectors may include semantic feature vectors that reflect potential semantics related to the word, so as to improve comprehensiveness and accuracy of the semantics reflected by the semantic feature vectors. For example, the term "director D1" corresponds to a semantic feature vector V1 that reflects the underlying semantics associated with the term, the underlying semantics reflected is "director D1 is the director of famous literature works," and so on.
The semantic feature vector in the preset corpus may be a vector obtained by vectorizing the preset dictionary and the words reflecting the latent semantics by using Word2vec in advance. Word2vec, a Word tovector model, is a model that can use a given corpus and a trained neural network to express a Word in a vector form.
The optional embodiment extracts the semantic feature vector of the text data as the vector of the candidate content, and the semantic feature vector may include a plurality of vectors in a preset corpus, so that the latent semantics related to the text data of the candidate content can be mined, and the comprehensiveness and accuracy of the semantics reflected by the semantic feature vector are improved, thereby improving the accuracy of acquiring popularity by subsequently utilizing the semantic features, improving the accuracy of content push, and making the reduction of the peak bandwidth occupation more effective.
In order to synthesize a plurality of semantic features reflected by a plurality of semantic feature vectors of each candidate content, the plurality of semantic feature vectors of each candidate content may be added, and the sum obtained by the addition may be normalized, so as to obtain a semantic feature capable of reflecting the entire semantic of each candidate content.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a content push apparatus.
As shown in fig. 3, a content pushing apparatus provided in an embodiment of the present invention is applied to a management server of a content distribution network, and the apparatus may include:
a candidate content determining module 301, configured to determine candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
a popularity determining module 302, configured to perform feature extraction on each candidate content, and input features of the candidate content into a preset model to obtain popularity of the candidate content; the preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content;
a popular content determining module 303, configured to select, from the candidate contents, a candidate content whose popularity satisfies a preset popular condition as popular content, and push the popular content to each edge IDC node in a preset push time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
In the scheme provided by the embodiment of the invention, because the sample content is the content pushed to the edge IDC node, the popularity of the sample content can be ensured to be determined, and therefore, the preset model obtained by utilizing the characteristics of a plurality of sample contents in advance can be used for obtaining the popularity of the content. Moreover, the sample content and the candidate content have common characteristics, so that the preset model can be used for acquiring the popularity of the candidate content, and selecting the popular content from the candidate content by using the acquired popularity; and, the candidate content is content that has not been pushed to the edge IDC node. Therefore, the popular content is pushed to each edge IDC node in the preset pushing time period, compared with the pushing of any content, the pushed content in the idle period is ensured to be the popular content which is relatively more likely to be requested by the user, and the quantity of unnecessary content in the pushed content and the bandwidth occupation in the peak period are reduced. Therefore, by the scheme, the popular content can be pushed in the idle period, so that the bandwidth occupation effect in the peak period is effectively reduced.
Optionally, the popularity includes: a popularity category divided by popularity of the content;
the plurality of sample contents are obtained by the following steps:
aiming at each content pushed to the edge IDC node, acquiring the feedback times of the content to the user by the content distribution network;
determining the popularity category of each content pushed to the edge IDC node by using the feedback times according to the corresponding relation between the preset popularity category and the feedback times;
and dividing each content pushed to the edge IDC nodes according to the determined popularity category to obtain the plurality of sample contents.
Optionally, the apparatus further includes a sample content determining module, configured to:
after the contents pushed to the edge IDC nodes are divided according to the determined popularity categories, counting the number of the contents of the popularity categories according to each popularity category before the plurality of sample contents are obtained;
comparing the counted number and taking the maximum number as a standard number;
for each popularity category, judging whether the number of contents of the popularity category is equal to the standard number;
if so, taking the content of the popularity category as the sample content, otherwise, supplementing the content of the popularity category to be equal to the standard quantity to be taken as the sample content.
Optionally, the popularity determination module 302 is specifically configured to:
acquiring character data of each candidate content; the text data is data capable of indicating semantic layer information of the candidate content;
aiming at each candidate content, extracting semantic features of the candidate content based on the character data of the candidate content; the semantic features are features capable of reflecting the meaning expressed by the text data.
As shown in fig. 4, a content pushing apparatus provided by another embodiment of the present invention is applied to a management server of a content distribution network, and the apparatus may include:
a candidate content determining module 401, configured to determine candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
a popularity determination module 402 comprising: the word segmentation sub-module 4021 is configured to perform word segmentation on the text data of each candidate content to obtain a plurality of words included in the text data of the candidate content; the semantic feature obtaining module 4022 is configured to, for each candidate content, search multiple semantic feature vectors corresponding to multiple words of the candidate content from a preset corpus; adding a plurality of semantic feature vectors of the candidate content aiming at each candidate content, and carrying out normalization calculation to obtain the semantic features of the candidate content; inputting the characteristics of the candidate content into a preset model to obtain the popularity of the candidate content;
a popular content determining module 403, configured to select, from the candidate contents, a candidate content whose popularity satisfies a preset popular condition as popular content, and push the popular content to each edge IDC node in a preset push time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
Corresponding to the above embodiment, an embodiment of the present invention further provides an electronic device, as shown in fig. 5, where the electronic device may include:
the system comprises a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory complete mutual communication through the communication bus 504 through the 503;
a memory 503 for storing a computer program;
the processor 501 is configured to implement the steps of any content push method in the above embodiments when executing the computer program stored in the memory 503.
In the scheme provided by the embodiment of the invention, because the sample content is the content pushed to the edge IDC node, the popularity of the sample content can be ensured to be determined, and therefore, the preset model obtained by utilizing the characteristics of a plurality of sample contents in advance can be used for obtaining the popularity of the content. Moreover, the sample content and the candidate content have common characteristics, so that the preset model can be used for acquiring the popularity of the candidate content, and selecting the popular content from the candidate content by using the acquired popularity; and, the candidate content is content that has not been pushed to the edge IDC node. Therefore, the popular content is pushed to each edge IDC node in the preset pushing time period, compared with the pushing of any content, the pushed content in the idle period is ensured to be the popular content which is relatively more likely to be requested by the user, and the quantity of unnecessary content in the pushed content and the bandwidth occupation in the peak period are reduced. Therefore, by the scheme, the popular content can be pushed in the idle period, so that the bandwidth occupation effect in the peak period is effectively reduced.
The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The computer-readable storage medium provided by an embodiment of the present invention is included in an electronic device, and a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any content push method in the foregoing embodiments are implemented.
In the scheme provided by the embodiment of the invention, because the sample content is the content pushed to the edge IDC node, the popularity of the sample content can be ensured to be determined, and therefore, the preset model obtained by utilizing the characteristics of a plurality of sample contents in advance can be used for obtaining the popularity of the content. Moreover, the sample content and the candidate content have common characteristics, so that the preset model can be used for acquiring the popularity of the candidate content, and selecting the popular content from the candidate content by using the acquired popularity; and, the candidate content is content that has not been pushed to the edge IDC node. Therefore, the popular content is pushed to each edge IDC node in the preset pushing time period, compared with the pushing of any content, the pushed content in the idle period is ensured to be the popular content which is relatively more likely to be requested by the user, and the quantity of unnecessary content in the pushed content and the bandwidth occupation in the peak period are reduced. Therefore, by the scheme, the popular content can be pushed in the idle period, so that the bandwidth occupation effect in the peak period is effectively reduced.
In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer causes the computer to execute the content push method described in any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber, DSL (Digital Subscriber Line), or wireless (e.g., infrared, radio, microwave, etc.), the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more integrated servers, data centers, etc., the available medium may be magnetic medium (e.g., floppy disk, hard disk, tape), optical medium (e.g., DVD (Digital Versatile Disc, digital versatile disc)), or a semiconductor medium (e.g.: SSD (Solid State Disk)), etc.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. A content push method applied to a management server of a content distribution network, the method comprising:
determining candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
for each candidate content, extracting the characteristics of the candidate content, and inputting the characteristics of the candidate content into a preset model to obtain the popularity of the candidate content; the preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content;
selecting candidate contents with the popularity meeting a preset popularity condition from the candidate contents as popular contents, and pushing the popular contents to each edge IDC node in a preset pushing time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
2. The method of claim 1, wherein the performing feature extraction on each candidate content comprises, for each candidate content:
acquiring character data of each candidate content; the text data is data capable of indicating semantic layer information of the candidate content;
aiming at each candidate content, extracting semantic features of the candidate content based on the character data of the candidate content; the semantic features are features capable of reflecting the meaning expressed by the text data.
3. The method of claim 2, wherein extracting semantic features of each candidate content based on text data of the candidate content comprises:
for each candidate content, performing word segmentation on the character data of the candidate content to obtain a plurality of words contained in the character data of the candidate content;
aiming at each candidate content, searching a plurality of semantic feature vectors corresponding to a plurality of words of the candidate content from a preset corpus;
and adding a plurality of semantic feature vectors of the candidate content aiming at each candidate content, and performing normalization calculation to obtain the semantic features of the candidate content.
4. The method of any of claims 1 to 3, wherein the popularity comprises: a popularity category divided by popularity of the content;
the plurality of sample contents are obtained by the following steps:
aiming at each content pushed to the edge IDC node, acquiring the feedback times of the content to the user by the content distribution network;
determining the popularity category of each content pushed to the edge IDC node by using the feedback times according to the corresponding relation between the preset popularity category and the feedback times;
and dividing each content pushed to the edge IDC nodes according to the determined popularity category to obtain the plurality of sample contents.
5. The method according to claim 4, wherein after the dividing the contents pushed to the edge IDC nodes according to the determined popularity category, before obtaining the plurality of sample contents, the method further comprises:
counting the number of contents of each popularity category;
comparing the counted number and taking the maximum number as a standard number;
for each popularity category, judging whether the number of contents of the popularity category is equal to the standard number;
if so, taking the content of the popularity category as the sample content, otherwise, supplementing the content of the popularity category to be equal to the standard quantity to be taken as the sample content.
6. A content pushing apparatus applied to a management server of a content distribution network, the apparatus comprising:
a candidate content determination module for determining candidate content from the content of the content distribution network; the candidate content is content which is not pushed to an edge IDC node;
the popularity determining module is used for extracting the characteristics of the candidate contents aiming at each candidate content and inputting the characteristics of the candidate contents into a preset model to obtain the popularity of the candidate contents; the preset model is a model obtained by utilizing the characteristics of a plurality of sample contents in advance; the sample content is the content pushed to the edge IDC node and has common characteristics with the candidate content;
the popular content determining module is used for selecting candidate content with the popularity meeting a preset popular condition from the candidate content to serve as popular content, and pushing the popular content to each edge IDC node in a preset pushing time period; and the preset pushing time period is a time period when the bandwidth occupation amount of the management server to the edge IDC nodes meets a preset bandwidth idle condition.
7. The apparatus of claim 6, wherein the popularity determination module is specifically configured to:
acquiring character data of each candidate content; the text data is data capable of indicating semantic layer information of the candidate content;
aiming at each candidate content, extracting semantic features of the candidate content based on the character data of the candidate content; the semantic features are features capable of reflecting the meaning expressed by the text data.
8. The apparatus of claim 7, wherein the popularity determination module comprises:
the word segmentation sub-module is used for segmenting word data of the candidate content according to each candidate content to obtain a plurality of words contained in the word data of the candidate content;
the semantic feature acquisition module is used for searching a plurality of semantic feature vectors corresponding to a plurality of words of each candidate content from a preset corpus aiming at each candidate content; and adding a plurality of semantic feature vectors of the candidate content aiming at each candidate content, and performing normalization calculation to obtain the semantic features of the candidate content.
9. The apparatus of any of claims 6 to 8, wherein the popularity comprises: a popularity category divided by popularity of the content;
the plurality of sample contents are obtained by the following steps:
aiming at each content pushed to the edge IDC node, acquiring the feedback times of the content to the user by the content distribution network;
determining the popularity category of each content pushed to the edge IDC node by using the feedback times according to the corresponding relation between the preset popularity category and the feedback times;
and dividing each content pushed to the edge IDC nodes according to the determined popularity category to obtain the plurality of sample contents.
10. The apparatus of claim 9, further comprising a sample content determination module configured to:
after the contents pushed to the edge IDC nodes are divided according to the determined popularity categories, counting the number of the contents of the popularity categories according to each popularity category before the plurality of sample contents are obtained;
comparing the counted number and taking the maximum number as a standard number;
for each popularity category, judging whether the number of contents of the popularity category is equal to the standard number;
if so, taking the content of the popularity category as the sample content, otherwise, supplementing the content of the popularity category to be equal to the standard quantity to be taken as the sample content.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the bus; a memory for storing a computer program; a processor for executing a program stored in the memory to perform the method steps of any of claims 1 to 5.
CN201910545055.3A 2019-06-21 2019-06-21 Content pushing method, device and equipment Active CN110290199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910545055.3A CN110290199B (en) 2019-06-21 2019-06-21 Content pushing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910545055.3A CN110290199B (en) 2019-06-21 2019-06-21 Content pushing method, device and equipment

Publications (2)

Publication Number Publication Date
CN110290199A CN110290199A (en) 2019-09-27
CN110290199B true CN110290199B (en) 2021-07-20

Family

ID=68004305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910545055.3A Active CN110290199B (en) 2019-06-21 2019-06-21 Content pushing method, device and equipment

Country Status (1)

Country Link
CN (1) CN110290199B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691143B (en) * 2019-10-21 2022-03-04 北京奇艺世纪科技有限公司 File pushing method and device, electronic equipment and medium
CN110764916B (en) * 2019-10-30 2022-06-03 北京声智科技有限公司 Information processing method, device, storage medium and equipment
CN112099949B (en) * 2020-09-11 2023-09-05 北京奇艺世纪科技有限公司 Task distribution control method and device, electronic equipment and storage medium
CN115250368A (en) * 2021-04-26 2022-10-28 北京字跳网络技术有限公司 Video preheating method, device, equipment and storage medium
CN113497831B (en) * 2021-06-30 2022-10-25 西安交通大学 Content placement method and system based on feedback popularity under mobile edge network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107483630A (en) * 2017-09-19 2017-12-15 北京工业大学 A kind of construction method for combining content distribution mechanism with CP based on the ISP of edge cache
CN109885774A (en) * 2019-02-28 2019-06-14 北京达佳互联信息技术有限公司 Recommended method, device and the equipment of individualized content

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2043329A1 (en) * 2007-09-28 2009-04-01 Alcatel Lucent A media-on-demand network, and a method of storing a media asset in a streaming node of the network
US10469609B2 (en) * 2015-12-31 2019-11-05 Time Warner Cable Enterprises Llc Methods and apparatus for serving content to customer devices based on dynamic content popularity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107483630A (en) * 2017-09-19 2017-12-15 北京工业大学 A kind of construction method for combining content distribution mechanism with CP based on the ISP of edge cache
CN109885774A (en) * 2019-02-28 2019-06-14 北京达佳互联信息技术有限公司 Recommended method, device and the equipment of individualized content

Also Published As

Publication number Publication date
CN110290199A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110290199B (en) Content pushing method, device and equipment
CN106897428B (en) Text classification feature extraction method and text classification method and device
US10795939B2 (en) Query method and apparatus
US10515133B1 (en) Systems and methods for automatically suggesting metadata for media content
US10318543B1 (en) Obtaining and enhancing metadata for content items
US10152479B1 (en) Selecting representative media items based on match information
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
WO2021098648A1 (en) Text recommendation method, apparatus and device, and medium
CN110019794B (en) Text resource classification method and device, storage medium and electronic device
CN111274442B (en) Method for determining video tag, server and storage medium
US11423096B2 (en) Method and apparatus for outputting information
CN108540508B (en) Method, device and equipment for pushing information
CN111159546A (en) Event pushing method and device, computer readable storage medium and computer equipment
CN111314732A (en) Method for determining video label, server and storage medium
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN114416998A (en) Text label identification method and device, electronic equipment and storage medium
CN110909266B (en) Deep paging method and device and server
US9323721B1 (en) Quotation identification
CN110569447B (en) Network resource recommendation method and device and storage medium
CN112836126A (en) Recommendation method and device based on knowledge graph, electronic equipment and storage medium
CN111753201A (en) Information pushing method, device, terminal and medium
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
CN110750708A (en) Keyword recommendation method and device and electronic equipment
CN116610853A (en) Search recommendation method, search recommendation system, computer device, and storage medium
CN111666522A (en) Information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant