WO2022048289A1

WO2022048289A1 - Content screening method and device

Info

Publication number: WO2022048289A1
Application number: PCT/CN2021/103571
Authority: WO
Inventors: 吴俊豪; 何其真
Original assignee: 上海哔哩哔哩科技有限公司
Priority date: 2020-09-04
Filing date: 2021-06-30
Publication date: 2022-03-10
Also published as: CN112417202A; US20230418890A1; CN112417202B

Abstract

A content screening method and device. The method comprises: obtaining a content set to be subjected to screening, the content set comprising a plurality of contents to be screened, each of said contents having identifier information, a label of at least one category, and a score, wherein the plurality of said contents are sorted in the content set in advance by means of the scores (S20); calculating, according to the label of each category in each of said contents and a weight value corresponding to the label of each category, a distribution proportion value of the label of each category contained in the content set (S21); calculating a target distribution proportion value of the label of each category according to each distribution proportion value and a preset label distribution proportion adjustment function (S22); and according to the target distribution proportion value of the label of each category and the weight value corresponding to the label of each category in each of said contents, sequentially screening out a target content meeting a first preset condition from the content set (S23). The method can save computing resources.

Description

Content screening method and device

This application claims the priority of the Chinese patent application with the application number 202010920038.6 and the invention titled "Content Screening Method and Device" filed with the China Patent Office on September 4, 2020, the entire contents of which are incorporated into this application by reference.

technical field

The present application relates to the field of computer technology, and in particular, to a content screening method and device.

Background technique

In various recommendation systems of different scenarios, it is usually necessary to go through the process of user portrait query, recommended content retrieval and recall, and multiple rounds of sorting and screening. The content is finally recommended to the user, and the intermediate sorting and screening process is generally carried out using pre-set screening rules. However, the inventors found that when the prior art uses preset screening rules to screen recommended content, it is generally necessary to perform nested traversal processing for each content to be screened, which results in the consumption of a large amount of computing resources in the screening process. And it takes a lot of time to filter out the target recommended content.

SUMMARY OF THE INVENTION

In view of this, the present application provides a content screening method, device, computer equipment and computer-readable storage medium, so as to solve the problem that in the prior art, when screening recommended content, a large amount of computing resources are consumed, and a lot of time is required. question.

The present application provides a content screening method, including:

Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;

Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;

Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.

Optionally, the calculation of the distribution weight value of the labels of each category contained in the content set according to the label of each category in each content to be screened and the weight value corresponding to the label of each category includes:

Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;

The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.

Optionally, according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each of the contents to be screened, the target content that meets the first preset condition is sequentially screened from the content set, including: :

According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:

Obtain the first weight value corresponding to the label of each category in the current content to be screened;

judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;

If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.

Optionally, the content screening method further includes:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.

Optionally, the content screening method further includes:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.

Optionally, the content screening method further includes:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.

Optionally, before the step of calculating the distribution weight value of the labels of each category contained in the content set according to the labels of each category in the respective contents to be screened and the weight values corresponding to the labels of each category, the method further includes: :

Calculate the weight value corresponding to the label of each category in each content to be filtered.

The present application also provides a content screening device, comprising:

an acquisition module, configured to acquire a content set to be screened, the content set includes a plurality of content to be screened, each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened The content is pre-sorted by scoring in the content set;

a first calculation module, configured to calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

The second calculation module is configured to calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function;

The screening module is configured to sequentially screen out the target content that meets the first preset condition from the content set according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened.

The present application also provides a computer device comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor executing the computer Implement the following steps when readable instructions:

The present application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

In the embodiment of the present application, by acquiring a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated. distribution proportion value; calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function; according to the target distribution proportion value of each category of labels, each of The weight values corresponding to the labels of the categories are sequentially selected from the content set to select the target content that meets the first preset condition. In the embodiment of the present application, when the content in the content set to be screened is screened, it only needs to perform traversal screening once for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal Therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.

Description of drawings

1 is a schematic diagram of screening content to be screened in an embodiment of the present application;

FIG. 2 is a flowchart of an embodiment of the content screening method described in this application;

3 is a detailed flow chart of the steps of calculating the distribution weight value of the labels of each category contained in the content set according to the label of each category and the weight value corresponding to the label of each category in the content to be screened;

Fig. 4 is the change situation of the quota value of the label of each class after the target distribution proportion value of the label of each class in the application is processed by the label distribution proportion adjustment function;

FIG. 5 is a program module diagram of an embodiment of the content screening apparatus described in this application;

FIG. 6 is a schematic diagram of a hardware structure of a computer device for executing a content screening method provided by an embodiment of the present application.

detailed description

The advantages of the present application are further described below with reference to the accompanying drawings and specific embodiments.

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."

In the description of the present application, it should be understood that the numerical labels before the steps do not identify the order of execution of the steps, but are only used to facilitate the description of the present application and to distinguish each step, and therefore should not be construed as a limitation on the present application.

FIG. 1 schematically shows a schematic diagram of screening content to be screened according to an embodiment of the present application. In an exemplary embodiment, 5000 manuscript sets are recalled from the content library to be recommended (the manuscript library) after performing operations such as querying, matching, and sorting according to the user portrait. After obtaining 5,000 manuscript sets, 2,000 manuscript sets are obtained after the first screening and sorting by the preset first screening rules. After that, 1,000 manuscript sets are obtained after the second screening and sorting by the preset second screening rules. Finally, , and then after several rounds of screening and sorting, the final recommended content can be obtained and recommended to users. Among them, each round of screening of the manuscript set is like a funnel to select and filter the manuscript set, and the screening rules are equivalent to setting the size of the funnel. funnel filter.

Referring to FIG. 2 , which is a schematic flowchart of a content screening method according to an embodiment of the present application. The content screening method of the present application can be applied to the content screening process of each funnel in the above-mentioned FIG. 1 . It can be understood that the flowchart in this embodiment of the method is not used to limit the sequence of execution steps. The following is an exemplary description with a computer device as the execution subject. As can be seen from the figure, the content screening method provided in this embodiment includes:

Step S20: Obtain a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the The content sets are pre-sorted by scoring.

Specifically, the content set may be the content recalled from the content library according to the user portrait and the characteristics of the content, wherein the recall refers to the retrieval of a large amount of content with a certain degree of relevance from the content library in the online service of the recommendation system. Process, this process uses less user and content features and responds faster. The content set may also be the content to be screened obtained after screening the recalled content one or more times.

In different recommendation scenarios, the multiple contents to be screened contained in the content set are different. For example, in an audio and video recommendation scenario, the content set includes a plurality of audio and video files to be screened; in a news recommendation scenario, the content set includes a plurality of news articles to be screened; in a commodity recommendation scenario, the content set includes Multiple items to filter.

It should be noted that, in order to facilitate the description of this application, in this embodiment and the following embodiments, the content to be screened is described by taking the video manuscript to be screened as an example, wherein the video manuscript refers to the user uploading to the platform video files in .

In this embodiment, each acquired video manuscript to be screened has identification information, a label of at least one category, and a score.

The identification information is ID (identity identification number) information used to uniquely distinguish different video manuscripts, and different video manuscripts have different IDs.

Each video manuscript to be screened has one or more categories of tags. Different video manuscripts to be screened may have the same or different tag categories. In addition, different video manuscripts may have the same or different number of tags. . For example, video manuscript 1 has tags tag_0, tag_1, video manuscript 2 has tags tag_2, tag_3, video manuscript 3 has tags tag_0, tag_2, and so on.

The score is obtained through a scoring model, and is used to indicate the correlation between the video manuscript to be screened and the user to be recommended. Generally speaking, the higher the score value, the higher the correlation between the video manuscript to be screened and the user to be recommended. The higher the sex, the lower the scoring value, which means that the video manuscript to be screened has a lower correlation with the user to be recommended.

In this embodiment, in order to facilitate the subsequent screening of a plurality of video manuscripts to be screened in the content set, the plurality of video manuscripts to be screened in the content set may be sorted in advance according to the score, for example, according to the scores from large to small In this way, when acquiring a content set, you can acquire a plurality of video manuscripts to be screened in descending order of scores.

Step S21: Calculate the distribution weight value of the tags of each category included in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category.

Specifically, each video manuscript to be screened has one or more categories of tags, and the tags of all categories in each video manuscript to be screened are assigned a total weight value (1) of the tags, that is, each video manuscript to be screened The weights of all the labels in the category add up to 1. Here, the weight value of 1 is only an example, the weight value of the tags of all categories of each video manuscript to be screened can be added to other values, and the weight value of the tags of all categories in the video manuscript to be screened can be added. It can be equal to the total weight value of the video manuscript.

The distribution proportion value (hereinafter referred to as "Quota value") refers to the proportion of the label distribution of each category after a plurality of video manuscripts to be screened in the content set are decomposed according to the label category. The proportion of label distribution can be the sum of all weight values assigned to the current category of labels.

It should be noted that the method of calculating the distribution weight value of each category of tags in this embodiment can be regarded as a process of performing component decomposition on large tags in a plurality of video manuscripts to be screened to obtain a component decomposition result.

Exemplarily, referring to FIG. 3 , the calculation of the distribution weight value of the tags of each category included in the content set according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened includes:

Step S30: Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags among all category tags included in the content set.

In step S31, the sum of all the obtained weight values is used as the distribution weight value of the label of the current category.

Specifically, when calculating the distribution weight value of the labels of each category, for the distribution weight value of the labels of each category, the weight value of the label of the current category in each content to be screened can also be obtained first, and then the obtained The sum of all weight values is used as the distribution weight value of the current category label.

For example, the label of the current category is label a, there are a total of video manuscript A, video manuscript B, and video manuscript C with label a in the content set, and the weight value of this label a in video manuscript A, video manuscript B, and video manuscript C The order is 0.4, 0.6, 0.8, then the distribution weight value of the label a=0.4+0.6+0.8=1.8. Similarly, for other types of labels, the above-mentioned similar method can also be used to calculate the distribution weight value of other types of labels.

In this embodiment, by taking the sum of all the obtained weight values as the distribution weight value of the label of the current category, the distribution weight value of the labels of each category can be obtained conveniently and quickly.

It can be understood that, when at least one category of tags in the content to be screened carries the weight value corresponding to the tag, then in order to be able to correspond to the tags of each category according to the tags of each category in the contents to be screened. To calculate the distribution weight value of the tags of each category included in the content set, it is necessary to first calculate the weight value corresponding to the tag of each category in each content to be screened.

In one embodiment, when calculating the weight value corresponding to the label of each category in the video manuscript to be screened, the calculation can be performed according to a preset weight distribution rule. For example, the preset weight distribution rule is The labels of all categories in the video manuscript equal to the total weight 1 of the video manuscript to be screened, then for the video manuscript A with the label a and the label b, the corresponding weight of the label a in the video manuscript A can be calculated. The value is 1/2=0.5, and the corresponding weight value of label b in the video manuscript A can be calculated as 1/2=0.5. Similarly, for a video manuscript B with a label a and a label c, it can be calculated that the corresponding weight value of the label a in the video manuscript B is 1/2=0.5, and it can be calculated that the label c is in the video manuscript B. The corresponding weight value is 1/2=0.5.

In another embodiment, when calculating the weight value corresponding to the label of each category in the video manuscript to be screened, the weight value corresponding to the label of each category can also be calculated according to the content of the video manuscript, such as , the video manuscript A has two tags of "funny" and "music". After analyzing the video manuscript A, it is found that the funny elements of the video manuscript A account for 80%, while the music elements only account for 20%. Then, after analyzing the video manuscript A, it can be calculated that the weight value corresponding to the "funny" tag is 0.8, and the weight value corresponding to the "music" tag accounts for 0.2.

Step S22: Calculate the target distribution weight values of the labels of each category according to each distribution weight value and a preset label distribution weight adjustment function.

Specifically, the label distribution weight adjustment function can be set with different functions according to different business scenarios. When the function is specifically set, at least one of the following objectives shall be satisfied:

Objective 1. The sum of the target Quota values of all categories of labels obtained after processing by the label distribution proportion adjustment function is N, where N is the number of target contents screened out from the content set.

Objective 2. The label categories that appear after processing by the label distribution weight adjustment function should be as many as possible.

Objective 3. The quota ratios of different categories of tags in the target quota value obtained after processing by the tag distribution proportion adjustment function are as close as possible to the original content set.

Goal 4. According to specific application scenarios, screen out the detailed processing of different tendencies. For example, the Quota values of all tags can be reconciled to make them close to the average value, or the tags with too high Quota values can be screened for peak clipping. The Quota value reduced by other methods can enter the free Quota pool and so on.

In a specific scenario, the label distribution proportion adjustment function is that the Quota values of all labels are correspondingly reduced by 2 times, and the changes of the Quota values of each category of labels processed by this function are shown in Figure 4.

Step S23 , according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each to-be-screened content, sequentially screen out the target content that meets the first preset condition from the content set.

Specifically, the first preset condition is that all categories of tags of the video manuscript to be screened have sufficient Quota values. In this embodiment, according to the target distribution weight value (target Quota value) of each category of tags, and the weight value corresponding to each category of tags in each video manuscript to be screened When setting the target content of the condition, each video manuscript to be screened can be selected and judged in turn according to the sorting of each video manuscript to be screened in the content set. If the labels of all categories of the video manuscript to be screened have enough Quota value, the video manuscript to be screened can be selected from the content set as the target content. After the selection and judgment processing of the video manuscript to be screened is completed, the next video manuscript is traversed, and then the video manuscript is selected and judged until all the video manuscripts are traversed, and the selection judgment is completed, the screening process is ended, or until When a preset number of target contents are screened out, the screening process of the video manuscript is stopped, wherein the preset number is the number of target contents that need to be screened out from the content library in advance.

It should be noted that, the method of filtering out the target content in this embodiment can be regarded as a process of decomposing the tags of the above categories and then performing tag recombination.

In an exemplary embodiment, according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened, the content set that conforms to the first preset is sequentially screened out. The target content of the condition includes:

According to the order of each content to be screened in the content set, the screening processing operation is performed on each content to be screened in sequence.

Specifically, when performing the screening processing operation, the screening processing needs to be performed in sequence according to the order of the video manuscripts to be screened in the content set. For video manuscript A, video manuscript B, video manuscript C, video manuscript D, and video manuscript E, when performing the screening operation, first screen the video manuscript A, and after completing the screening process of the video manuscript A, continue to screen the video manuscript A. The manuscript B is screened, and then the video manuscript C, the video manuscript D, and the video manuscript E are screened in sequence.

In this embodiment, the screening processing operation includes: obtaining a first weight value corresponding to the label of each category in the currently to-be-screened content; judging the first target corresponding to the category label in the current to-be-screened content Whether the distribution weight value is greater than or equal to the first weight value; if so, take the current content to be screened as the target content, and update the first target distribution weight value with the difference between the first weight value and the first weight value. A target distribution weight value.

Specifically, when the current screening operation is to screen the video manuscript A, the first weight values corresponding to the label a and the label b contained in the video manuscript A can be obtained first, assuming that they are 0.5 and 0.5 respectively, then in the After obtaining the first weight values corresponding to label a and label b, it can be determined whether the first target Quota value corresponding to label a is greater than or equal to 0.5, and at the same time, whether the first target Quota value corresponding to label b is greater than or equal to 0.5 can be determined. , the first target Quota value corresponding to the label a and the first target Quota value corresponding to the label b are 4.0 and 3.5 respectively, then the video manuscript A can be screened out from the content set as the target content, and at the same time, the video manuscript A can be screened out. The difference between a target distribution weight value and the first weight value is used to update the previous first target distribution weight value, that is, the difference value: 4.0-0.5=3.5 is updated to the first target Quota value corresponding to the label a, and the difference value is updated. 3.5-0.5=3.0 is updated to the first target Quota value corresponding to label b.

After completing the screening processing operation of the video manuscript A, continue to perform the screening processing on the video manuscript B, the video manuscript C, the video manuscript D, and the video manuscript E in sequence according to the above method.

In the embodiment of the present application, by acquiring a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated. distribution proportion value; calculate the target distribution proportion value of each category of labels according to each distribution proportion value and the preset label distribution proportion adjustment function; according to the target distribution proportion value of each category of labels, each of the content to be screened The weight values corresponding to the labels of the categories are sequentially selected from the content set to select the target content that meets the first preset condition. In the embodiment of the present application, when the content in the content set to be screened is screened, it is only necessary to perform one traversal screening for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal , therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.

In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the second preset condition may be continuously screened from the remaining contents to be screened in the content set, wherein, The second preset condition is that the target distribution proportion value corresponding to the tags of at least one category in the currently to-be-screened content is not zero.

Specifically, the preset number is the preset number of target contents that need to be screened out from the content set. For example, there are 10 video manuscripts in the content set, but only 4 target contents are screened, then at this time , a video manuscript whose target Quota value corresponding to at least one category of tags in the video manuscript is not zero will be selected as the target content from the remaining 6 video manuscripts in the content set.

Exemplarily, it is assumed that the remaining 6 video manuscripts are sorted in descending order of scores as video manuscript 1, video manuscript 2, video manuscript 3, video manuscript 4, video manuscript 5, and video manuscript 6. The target Quota value corresponding to label a in video manuscript 1 is 0.2. The target Quota value corresponding to label b in video manuscript 2 is 0.3. The target Quota value corresponding to label b in video manuscript 3 is 0.4. If all categories of labels in Video Contribution 4, Video Contribution 5 and Video Contribution 6 have a corresponding Quota value of 0, then when the screening operation is performed, Video Contribution 1, Video Contribution 2, and Video Contribution 3 can all be used as targets. content. Of course, if only one more video manuscript needs to be screened as the target content at present, then only the video manuscript 1 with the highest score can be used as the target content; if only two more video manuscripts need to be screened as the target content, Luo will select the one with the highest score as the target content. Video Contribution 1 and Video Contribution 2 serve as target content.

In this embodiment, the target content that meets the second preset condition is selected from the remaining content to be screened in the content set when the preset number of target content is not obtained after screening, thereby improving the label coverage of content screening (appearing in The ratio of the number of tags in the filtered result set to the total number of tags in the original content set).

In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the third preset condition may also continue to be screened from the remaining contents to be screened in the content set, wherein , and the third preset condition is that the current content to be screened has a preset mark.

Specifically, the preset mark is a mark used to mark the video manuscript to be screened as a low-quality video manuscript. When the correlation between the label of the video manuscript and the tags of other high-quality video manuscripts is poor, the Flag this type of video feed as low-quality video feed.

In this embodiment, by selecting low-quality video manuscripts as the target content, the diversity of the content to be screened can be improved.

In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the fourth preset condition may also be continuously screened from the remaining contents to be screened in the content set, Wherein, the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.

Specifically, when the number of target contents obtained by screening is less than the preset amount, the target contents may be screened out from the remaining video manuscripts to be screened in descending order of the scores. The remaining video manuscripts to be screened take the above-mentioned manuscripts 1 to 6 as examples. When 1 video manuscript needs to be screened out as the target content, then the video manuscript 1 can be screened out as the target content. When 1 video manuscript is screened out as the target content, then the video manuscript 1 and the video manuscript 2 can be screened out as the target content.

In this embodiment, by selecting a video manuscript with a higher score as the target content, the scoring priority ratio (the ratio of processing scores in the previous round of current screening/top-ranked manuscripts to entering screening results) can be improved.

In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the fifth preset condition may also be continuously screened from the remaining contents to be screened in the content set, Wherein, the fifth preset condition is that the target distribution proportion values corresponding to the labels of all categories of the content A currently to be screened are zero, but the labels of each category in the current content A to be screened (assuming that the labels a and b are included) The total number of tags does not exceed the preset threshold. For example, if the preset threshold is 5, the number of tags a included in all the filtered target content is 4, and the number of tags b included is 3, then the current The content A to be screened is used as the target content; if the number of tags a included in all the screened target content is 5, and the number of tags b included is 6, the current content A to be screened cannot be used as the target content.

Exemplarily, in order to facilitate understanding of the technical solutions of the present application, the technical solutions of the present application are described below with reference to a specific application scenario.

Suppose that 5 video manuscripts need to be selected from 10 video manuscripts as the target content, and the details of the 10 video manuscripts after ranking from large to small according to the score (Score) are shown in the following table:

Id(标识信息)Id (identification information)	Tags(标签)Tags	Score(评分)Score
id_0id_0	tag_0,tag_1tag_0,tag_1	0.950.95
id_1id_1	tag_2,tag_3tag_2,tag_3	0.90.9
id_2id_2	tag_4,tag_5tag_4,tag_5	0.850.85
id_3id_3	tag_0,tag_2tag_0,tag_2	0.80.8
id_4id_4	tag_1,tag_4tag_1,tag_4	0.750.75
id_5id_5	tag_3,tag_5tag_3,tag_5	0.70.7
id_6id_6	tag_0,tag_6tag_0,tag_6	0.650.65
id_7id_7	tag_0,tag_7tag_0,tag_7	0.60.6
id_8id_8	tag_0,tag_6tag_0,tag_6	0.550.55
id_9id_9	tag_6,tag_7tag_6,tag_7	0.50.5

If the tags of each category in the 10 video manuscripts are rated with a total weight value of 1, that is, the weight value of the tags of each category in the 10 video manuscripts is 0.5, then according to each content to be screened The weight value corresponding to the label of each category and the label of each category can be calculated to obtain the Quota value of the label of each category as shown in the following table:

TagTag	QuotaQuota
tag_0tag_0	2.52.5

tag_1tag_1	11
tag_2tag_2	11
tag_3tag_3	11
tag_4tag_4	11
tag_5tag_5	11
tag_6tag_6	1.51.5
tag_7tag_7	11

After obtaining the Quota values of the labels of each category, assuming that each Quota value is proportionally reduced by 2 times through the label distribution weight adjustment function, each target Quota value as shown in the following table can be obtained:

TagTag	目标QuotaTarget Quota
tag_0tag_0	1.251.25
tag_1tag_1	0.50.5
tag_2tag_2	0.50.5
tag_3tag_3	0.50.5
tag_4tag_4	0.50.5
tag_5tag_5	0.50.5
tag_6tag_6	0.750.75
tag_7tag_7	0.50.5

After each target Quota value is obtained, the filtering operation can be performed in sequence according to the order of each video manuscript in the 10 video manuscripts. First of all, for the video manuscript of id_0, since the video manuscript includes tag_0 with a weight value of 0.5 and tag_1 with a weight value of 0.5, and the target Quota values corresponding to the current tag_0 and tag_1 are both greater than 0.5, therefore, the identification information can be identified as id_0. The video manuscript is screened out as the target content, and the difference 1.25-0.5=0.75 between the target Quota value 1.25 corresponding to tag_0 in id_0 and the weight value 0.5 corresponding to tag_0 is used as the target Quota value of the updated tag_0. The difference between the target Quota value 0.5 corresponding to tag_1 and the weight value 0.5 corresponding to tag_1 is 0.5-0.5=0 as the target Quota value of the updated tag_1. After the update, each target Quota value as shown in the following table can be obtained:

TagTag	目标QuotaTarget Quota
tag_0tag_0	0.750.75
tag_1 tag_1	00
tag_2tag_2	0.50.5

tag_3tag_3	0.50.5
tag_4tag_4	0.50.5
tag_5tag_5	0.50.5
tag_6tag_6	0.750.75
tag_7tag_7	0.50.5

In the same way, the video manuscripts of id_1 and id_2 can be screened out as the target content. After screening the video manuscripts of id_1 and id_2, the target Quota values shown in the following table can be obtained:

TagTag	目标QuotaTarget Quota
tag_0tag_0	0.750.75
tag_1 tag_1	00
tag_2 tag_2	00
tag_3 tag_3	00
tag_4 tag_4	00
tag_5 tag_5	00
tag_6tag_6	0.750.75
tag_7tag_7	0.50.5

Next, screen the video manuscript of id_3. Since the video manuscript includes tag_0 with a weight value of 0.5 and tag_2 with a weight value of 0.5, and the target Quota value corresponding to the current tag_2 is 0, which is less than 0.5, it is not possible to use Video manuscripts whose identification information is id_3 are screened out as target content. Similarly, the video manuscripts of id_4 and id_5 cannot be screened out as target content because they do not have enough target quota values.

After that, the video manuscript of id_6 is screened. Since the video manuscript includes tag_0 with a weight value of 0.5 and tag_6 with a weight value of 0.5, and the target Quota values corresponding to the current tag_0 and tag_6 are both greater than 0.5, the identification The video manuscript whose information is id_6 is screened out as the target content, and the difference 0.75-0.5=0.25 between the target Quota value 0.75 corresponding to tag_0 in id_6 and the weight value 0.5 corresponding to tag_0 is used as the target Quota value of the updated tag_0, in the same way , the difference between the target Quota value 0.75 corresponding to tag_6 in id_6 and the weight value 0.5 corresponding to tag_6 is 0.75-0.5=0.25 as the target Quota value of the updated tag_6. After the update, each target Quota value as shown in the following table can be obtained:

TagTag	目标QuotaTarget Quota
tag_0tag_0	0.250.25
tag_1 tag_1	00

tag_2 tag_2	00
tag_3 tag_3	00
tag_4 tag_4	00
tag_5 tag_5	00
tag_6tag_6	0.250.25
tag_7tag_7	0.50.5

Finally, the video manuscripts of id_7, id_8 and id_9 are screened in turn. Since the video manuscripts of id_7, id_8 and id_9 do not have enough target quota values, they cannot be screened out as target content.

Since after completing the screening of all video manuscripts, only the video manuscripts of {id_0, id_1, id_2, id_6} are selected as the target content, and our screening target is 5 video manuscripts, therefore, in one embodiment , the video manuscripts whose target Quota value corresponding to the label of at least one category in the video manuscripts corresponding to the target Quota value is not 0 can be further screened from the remaining id_3, id_4 and id_5, id_7, id_8 and id_9 video manuscripts as the target content. In this embodiment, both the video manuscripts of id_3 and id_7 have at least one category of tags corresponding to a target quota value other than 0. However, since the target quota values corresponding to the two categories of tags in the video manuscript of id_7 are both is not 0, and the target Quota value corresponding to only one category of tags in the video manuscript of id_3 is not 0. Therefore, in order to obtain a better label distribution rate, the video manuscript of id_7 can be filtered out as the target content.

In another embodiment, it is also possible to further screen out the video manuscripts whose scores in the video manuscripts are greater than the scores of other content to be screened from the remaining video manuscripts of id_3, id_4, id_5, id_7, id_8 and id_9 as the video manuscripts. For the target content, in this embodiment, since the video manuscript of id_3 has the highest score, the video manuscript of id_3 can be screened out as the target content.

Referring to FIG. 5 , it is a program module diagram of an embodiment of the content screening apparatus 50 of the present application.

In this embodiment, the content screening apparatus 50 includes a series of computer-readable instructions stored in the memory, and when the computer-readable instructions are executed by the processor, the content screening function of each embodiment of the present application can be implemented. In some embodiments, the content screening apparatus 50 may be divided into one or more modules based on the specific operations implemented by the various portions of the computer readable instructions. For example, in FIG. 5 , the content screening apparatus 50 may be divided into an acquisition module 51 , a first calculation module 52 , a second calculation module 53 , and a screening module 54 . in:

The obtaining module 51 is configured to obtain a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened Filtered content is pre-ordered by scoring in the content set.

The first calculation module 52 is configured to calculate the distribution weight value of the tags of each category included in the content set according to the tags of each category in the respective contents to be screened and the weight values corresponding to the tags of each category.

In an exemplary embodiment, the first calculation module 52 is further configured to obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is among all the category labels included in the content set. and use the sum of all obtained weight values as the distribution weight value of the current category of labels.

The second calculation module 53 is configured to calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function.

The screening module 54 is configured to screen out the target content that meets the first preset condition from the content set in turn according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each to-be-screened content.

In an exemplary embodiment, the screening module 54 is further configured to perform a screening processing operation on each to-be-screened content in sequence according to the order of each to-be-screened content in the content set, wherein the screening processing operation includes: obtaining the current the first weight value corresponding to the label of each category in the content to be screened; determine whether the first target distribution proportion value corresponding to the category label in the current content to be screened is greater than or equal to the first weight value; , the current content to be screened is taken as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.

In an exemplary embodiment, the content screening apparatus 50 further includes a third computing module.

The third calculation module is used to calculate the weight value corresponding to the label of each category in the content to be screened.

In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than the preset number, screen out the remaining contents to be screened in the content set that meet the second preset condition. The target content, wherein the second preset condition is that the target distribution proportion value corresponding to the label of at least one category in the currently to-be-screened content is not zero.

In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than the preset number, screen out the content that meets the third preset condition from the remaining contents to be screened in the content set. The target content, wherein the third preset condition is that the current content to be screened has a preset mark.

In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than a preset number, screen out the remaining contents to be screened in the content set that meet the fourth preset condition. The target content of , wherein the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.

FIG. 6 schematically shows a schematic diagram of a hardware architecture of a computer device 6 suitable for implementing a content screening method according to an embodiment of the present application. In this embodiment, the computer device 6 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. For example, it can be a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like. As shown in FIG. 6 , the computer device 6 at least includes but is not limited to: a memory 120 , a processor 121 , and a network interface 122 that can communicate with each other through a system bus. in:

The memory 120 includes at least one type of computer-readable storage medium, wherein the computer-readable storage medium may be volatile or non-volatile. The computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electronic Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 120 may be an internal storage module of the computer device 6 , such as a hard disk or memory of the computer device 6 . In other embodiments, the memory 120 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC for short), a secure digital (Secure) Digital, referred to as SD) card, flash memory card (Flash Card) and so on. Of course, the memory 120 may also include both an internal storage module of the computer device 6 and an external storage device thereof. In this embodiment, the memory 120 is generally used to store the operating system installed in the computer device 6 and various application software, such as program codes of the content screening method, and the like. In addition, the memory 120 may also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 121 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 121 is generally used to control the overall operation of the computer device 6 , such as performing control and processing related to data interaction or communication with the computer device 6 . In this embodiment, the processor 121 is configured to execute program codes or process data stored in the memory 120 .

The network interface 122, which may include a wireless network interface or a wired network interface, is typically used to establish a communication link between the computer device 6 and other computer devices. For example, the network interface 122 is used to connect the computer device 6 with an external terminal through a network, and establish a data transmission channel and a communication link between the computer device 6 and the external terminal. The network can be Intranet, Internet, Global System of Mobile communication (GSM for short), Wideband Code Division Multiple Access (WCDMA for short), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.

It should be noted that FIG. 6 only shows a computer device having components 120-122, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead.

In this embodiment, the content screening method stored in the memory 120 can be divided into one or more program modules and executed by one or more processors (the processor 121 in this embodiment) to complete the present application .

Embodiments of the present application provide a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

In this embodiment, the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory ( ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the computer-readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC for short), a secure digital ( Secure Digital, referred to as SD) card, flash memory card (Flash Card) and so on. Of course, the computer-readable storage medium may also include both an internal storage unit of a computer device and an external storage device thereof. In this embodiment, the computer-readable storage medium is generally used to store the operating system and various application software installed in the computer device, for example, the program code of the content screening method in the embodiment. In addition, the computer-readable storage medium can also be used to temporarily store various types of data that have been output or will be output.

The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed over at least two network elements. Some or all of the modules may be screened out according to actual needs to achieve the purpose of the solutions of the embodiments of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.

From the description of the above embodiments, those of ordinary skill in the art can clearly understand that each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware. Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the program can be stored in a computer-readable storage medium. When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. Scope.

Claims

A content screening method that includes:

Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;

Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;

Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
The content screening method according to claim 1, wherein the distribution weight value of the tags of each category contained in the content set is calculated according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened include:

Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;

The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
The content screening method according to claim 1 or 2, wherein according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened The target content of the first preset condition includes:

According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:

Obtain the first weight value corresponding to the label of each category in the current content to be screened;

Judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;

If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
The content screening method according to claim 3, further comprising:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
The content screening method according to claim 3, further comprising:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
The content screening method according to claim 3, further comprising:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
The content screening method according to claim 1, 2, 4, 5 or 6, wherein the content set is calculated according to a label of each category in the respective contents to be screened and a weight value corresponding to the label of each category Before the step of including the distribution weights of the labels of each category, also include:

Calculate the weight value corresponding to the label of each category in each content to be filtered.
A content screening device, comprising:

an acquisition module, configured to acquire a content set to be screened, the content set includes a plurality of content to be screened, each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened The content is pre-sorted by scoring in the content set;

a first calculation module, configured to calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

The second calculation module is configured to calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function;

The screening module is configured to sequentially screen out the target content that meets the first preset condition from the content set according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened.
A computer device comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor implementing the computer-readable instructions when executed The following steps:

Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;

Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;

Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
The computer device according to claim 9, wherein calculating the distribution weight value of the tags of each category included in the content set according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened comprises the following steps: :

Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;

The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
The computer device according to claim 9 or 10, wherein according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened, the content set that meets the requirements of the first The target content of a preset condition includes:

According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:

Obtain the first weight value corresponding to the label of each category in the current content to be screened;

judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;

If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
A computer-readable storage medium on which computer-readable instructions are stored, characterized in that: when the computer-readable instructions are executed by a processor, the following steps are implemented:

Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;

Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;

Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;

Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
The computer-readable storage medium according to claim 15, wherein the distribution of the tags of each category included in the content set is calculated according to the tags of each category in the respective contents to be screened and the weight values corresponding to the tags of each category Specific gravity values include:

Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;

The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
The computer-readable storage medium according to claim 15 or 16, wherein the content set is filtered in sequence according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened The target content that meets the first preset condition includes:

According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:

Obtain the first weight value corresponding to the label of each category in the current content to be screened;

judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;

If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
The computer-readable storage medium of claim 17, the computer-readable instructions further implementing the following steps when executed by the processor:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
The computer-readable storage medium of claim 17, the computer-readable instructions further implementing the following steps when executed by the processor:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
The computer-readable storage medium of claim 17, wherein the computer-readable instructions, when executed by the processor, further implement the following steps:

When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.