WO2022048289A1 - Content screening method and device - Google Patents

Content screening method and device Download PDF

Info

Publication number
WO2022048289A1
WO2022048289A1 PCT/CN2021/103571 CN2021103571W WO2022048289A1 WO 2022048289 A1 WO2022048289 A1 WO 2022048289A1 CN 2021103571 W CN2021103571 W CN 2021103571W WO 2022048289 A1 WO2022048289 A1 WO 2022048289A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
screened
category
weight value
target
Prior art date
Application number
PCT/CN2021/103571
Other languages
French (fr)
Chinese (zh)
Inventor
吴俊豪
何其真
Original Assignee
上海哔哩哔哩科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海哔哩哔哩科技有限公司 filed Critical 上海哔哩哔哩科技有限公司
Priority to US18/024,485 priority Critical patent/US20230418890A1/en
Publication of WO2022048289A1 publication Critical patent/WO2022048289A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Definitions

  • the present application relates to the field of computer technology, and in particular, to a content screening method and device.
  • the present application provides a content screening method, device, computer equipment and computer-readable storage medium, so as to solve the problem that in the prior art, when screening recommended content, a large amount of computing resources are consumed, and a lot of time is required. question.
  • the present application provides a content screening method, including:
  • the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
  • Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  • the calculation of the distribution weight value of the labels of each category contained in the content set according to the label of each category in each content to be screened and the weight value corresponding to the label of each category includes:
  • the sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
  • the target content that meets the first preset condition is sequentially screened from the content set, including: :
  • the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:
  • the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
  • the content screening method further includes:
  • target contents obtained by screening When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content
  • the target distribution weight value corresponding to the label of at least one category in the content is not zero.
  • the content screening method further includes:
  • the content screening method further includes:
  • the method further includes: :
  • the present application also provides a content screening device, comprising:
  • an acquisition module configured to acquire a content set to be screened, the content set includes a plurality of content to be screened, each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened The content is pre-sorted by scoring in the content set;
  • a first calculation module configured to calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
  • the second calculation module is configured to calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function
  • the screening module is configured to sequentially screen out the target content that meets the first preset condition from the content set according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened.
  • the present application also provides a computer device comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor executing the computer Implement the following steps when readable instructions:
  • the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
  • Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  • the present application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
  • Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  • the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated.
  • the present application when the content in the content set to be screened is screened, it only needs to perform traversal screening once for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal Therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
  • FIG. 1 is a schematic diagram of screening content to be screened in an embodiment of the present application
  • FIG. 2 is a flowchart of an embodiment of the content screening method described in this application.
  • 3 is a detailed flow chart of the steps of calculating the distribution weight value of the labels of each category contained in the content set according to the label of each category and the weight value corresponding to the label of each category in the content to be screened;
  • Fig. 4 is the change situation of the quota value of the label of each class after the target distribution proportion value of the label of each class in the application is processed by the label distribution proportion adjustment function;
  • FIG. 5 is a program module diagram of an embodiment of the content screening apparatus described in this application.
  • FIG. 6 is a schematic diagram of a hardware structure of a computer device for executing a content screening method provided by an embodiment of the present application.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • FIG. 1 schematically shows a schematic diagram of screening content to be screened according to an embodiment of the present application.
  • 5000 manuscript sets are recalled from the content library to be recommended (the manuscript library) after performing operations such as querying, matching, and sorting according to the user portrait.
  • 2,000 manuscript sets are obtained after the first screening and sorting by the preset first screening rules.
  • 1,000 manuscript sets are obtained after the second screening and sorting by the preset second screening rules.
  • the final recommended content can be obtained and recommended to users.
  • each round of screening of the manuscript set is like a funnel to select and filter the manuscript set, and the screening rules are equivalent to setting the size of the funnel. funnel filter.
  • FIG. 2 is a schematic flowchart of a content screening method according to an embodiment of the present application.
  • the content screening method of the present application can be applied to the content screening process of each funnel in the above-mentioned FIG. 1 . It can be understood that the flowchart in this embodiment of the method is not used to limit the sequence of execution steps.
  • the following is an exemplary description with a computer device as the execution subject.
  • the content screening method provided in this embodiment includes:
  • Step S20 Obtain a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the The content sets are pre-sorted by scoring.
  • the content set may be the content recalled from the content library according to the user portrait and the characteristics of the content, wherein the recall refers to the retrieval of a large amount of content with a certain degree of relevance from the content library in the online service of the recommendation system. Process, this process uses less user and content features and responds faster.
  • the content set may also be the content to be screened obtained after screening the recalled content one or more times.
  • the multiple contents to be screened contained in the content set are different.
  • the content set includes a plurality of audio and video files to be screened; in a news recommendation scenario, the content set includes a plurality of news articles to be screened; in a commodity recommendation scenario, the content set includes Multiple items to filter.
  • the content to be screened is described by taking the video manuscript to be screened as an example, wherein the video manuscript refers to the user uploading to the platform video files in .
  • each acquired video manuscript to be screened has identification information, a label of at least one category, and a score.
  • the identification information is ID (identity identification number) information used to uniquely distinguish different video manuscripts, and different video manuscripts have different IDs.
  • Each video manuscript to be screened has one or more categories of tags. Different video manuscripts to be screened may have the same or different tag categories. In addition, different video manuscripts may have the same or different number of tags. . For example, video manuscript 1 has tags tag_0, tag_1, video manuscript 2 has tags tag_2, tag_3, video manuscript 3 has tags tag_0, tag_2, and so on.
  • the score is obtained through a scoring model, and is used to indicate the correlation between the video manuscript to be screened and the user to be recommended.
  • the higher the score value the higher the correlation between the video manuscript to be screened and the user to be recommended.
  • the higher the sex the lower the scoring value, which means that the video manuscript to be screened has a lower correlation with the user to be recommended.
  • the plurality of video manuscripts to be screened in the content set may be sorted in advance according to the score, for example, according to the scores from large to small In this way, when acquiring a content set, you can acquire a plurality of video manuscripts to be screened in descending order of scores.
  • Step S21 Calculate the distribution weight value of the tags of each category included in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category.
  • each video manuscript to be screened has one or more categories of tags, and the tags of all categories in each video manuscript to be screened are assigned a total weight value (1) of the tags, that is, each video manuscript to be screened
  • the weights of all the labels in the category add up to 1.
  • the weight value of 1 is only an example, the weight value of the tags of all categories of each video manuscript to be screened can be added to other values, and the weight value of the tags of all categories in the video manuscript to be screened can be added. It can be equal to the total weight value of the video manuscript.
  • the distribution proportion value (hereinafter referred to as "Quota value”) refers to the proportion of the label distribution of each category after a plurality of video manuscripts to be screened in the content set are decomposed according to the label category.
  • the proportion of label distribution can be the sum of all weight values assigned to the current category of labels.
  • the method of calculating the distribution weight value of each category of tags in this embodiment can be regarded as a process of performing component decomposition on large tags in a plurality of video manuscripts to be screened to obtain a component decomposition result.
  • the calculation of the distribution weight value of the tags of each category included in the content set according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened includes:
  • Step S30 Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags among all category tags included in the content set.
  • step S31 the sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
  • the weight value of the label of the current category in each content to be screened can also be obtained first, and then the obtained The sum of all weight values is used as the distribution weight value of the current category label.
  • the label of the current category is label a
  • there are a total of video manuscript A, video manuscript B, and video manuscript C with label a in the content set and the weight value of this label a in video manuscript A, video manuscript B, and video manuscript C
  • the above-mentioned similar method can also be used to calculate the distribution weight value of other types of labels.
  • the distribution weight value of the labels of each category can be obtained conveniently and quickly.
  • the calculation can be performed according to a preset weight distribution rule.
  • the weight value corresponding to the label of each category can also be calculated according to the content of the video manuscript, such as , the video manuscript A has two tags of "funny" and "music". After analyzing the video manuscript A, it is found that the funny elements of the video manuscript A account for 80%, while the music elements only account for 20%. Then, after analyzing the video manuscript A, it can be calculated that the weight value corresponding to the "funny" tag is 0.8, and the weight value corresponding to the "music” tag accounts for 0.2.
  • Step S22 Calculate the target distribution weight values of the labels of each category according to each distribution weight value and a preset label distribution weight adjustment function.
  • the label distribution weight adjustment function can be set with different functions according to different business scenarios.
  • the function is specifically set, at least one of the following objectives shall be satisfied:
  • Goal 4 screen out the detailed processing of different tendencies.
  • the Quota values of all tags can be reconciled to make them close to the average value, or the tags with too high Quota values can be screened for peak clipping.
  • the Quota value reduced by other methods can enter the free Quota pool and so on.
  • the label distribution proportion adjustment function is that the Quota values of all labels are correspondingly reduced by 2 times, and the changes of the Quota values of each category of labels processed by this function are shown in Figure 4.
  • Step S23 according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each to-be-screened content, sequentially screen out the target content that meets the first preset condition from the content set.
  • the method of filtering out the target content in this embodiment can be regarded as a process of decomposing the tags of the above categories and then performing tag recombination.
  • the target distribution weight value of each category of tags includes:
  • the screening processing operation includes: obtaining a first weight value corresponding to the label of each category in the currently to-be-screened content; judging the first target corresponding to the category label in the current to-be-screened content Whether the distribution weight value is greater than or equal to the first weight value; if so, take the current content to be screened as the target content, and update the first target distribution weight value with the difference between the first weight value and the first weight value.
  • a target distribution weight value is
  • the first weight values corresponding to the label a and the label b contained in the video manuscript A can be obtained first, assuming that they are 0.5 and 0.5 respectively, then in the After obtaining the first weight values corresponding to label a and label b, it can be determined whether the first target Quota value corresponding to label a is greater than or equal to 0.5, and at the same time, whether the first target Quota value corresponding to label b is greater than or equal to 0.5 can be determined.
  • the first target Quota value corresponding to the label a and the first target Quota value corresponding to the label b are 4.0 and 3.5 respectively
  • the video manuscript A can be screened out from the content set as the target content, and at the same time, the video manuscript A can be screened out.
  • the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
  • the target content that meets the second preset condition may be continuously screened from the remaining contents to be screened in the content set, wherein,
  • the second preset condition is that the target distribution proportion value corresponding to the tags of at least one category in the currently to-be-screened content is not zero.
  • the remaining 6 video manuscripts are sorted in descending order of scores as video manuscript 1, video manuscript 2, video manuscript 3, video manuscript 4, video manuscript 5, and video manuscript 6.
  • the target Quota value corresponding to label a in video manuscript 1 is 0.2.
  • the target Quota value corresponding to label b in video manuscript 2 is 0.3.
  • the target Quota value corresponding to label b in video manuscript 3 is 0.4. If all categories of labels in Video Contribution 4, Video Contribution 5 and Video Contribution 6 have a corresponding Quota value of 0, then when the screening operation is performed, Video Contribution 1, Video Contribution 2, and Video Contribution 3 can all be used as targets. content.
  • the target content that meets the second preset condition is selected from the remaining content to be screened in the content set when the preset number of target content is not obtained after screening, thereby improving the label coverage of content screening (appearing in The ratio of the number of tags in the filtered result set to the total number of tags in the original content set).
  • the target content that meets the third preset condition may also continue to be screened from the remaining contents to be screened in the content set, wherein , and the third preset condition is that the current content to be screened has a preset mark.
  • the target content that meets the fourth preset condition may also be continuously screened from the remaining contents to be screened in the content set, wherein, the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.
  • the scoring priority ratio (the ratio of processing scores in the previous round of current screening/top-ranked manuscripts to entering screening results) can be improved.
  • the preset threshold is 5, the number of tags a included in all the filtered target content is 4, and the number of tags b included is 3, then the current The content A to be screened is used as the target content; if the number of tags a included in all the screened target content is 5, and the number of tags b included is 6, the current content A to be screened cannot be used as the target content.
  • the tags of each category in the 10 video manuscripts are rated with a total weight value of 1, that is, the weight value of the tags of each category in the 10 video manuscripts is 0.5, then according to each content to be screened The weight value corresponding to the label of each category and the label of each category can be calculated to obtain the Quota value of the label of each category as shown in the following table:
  • Tag Target Quota tag_0 1.25 tag_1 0.5 tag_2 0.5 tag_3 0.5 tag_4 0.5 tag_5 0.5 tag_6 0.75 tag_7 0.5
  • tag_3 0.5 tag_4 0.5 tag_5 0.5 tag_6 0.75 tag_7 0.5
  • the video manuscripts of id_7, id_8 and id_9 are screened in turn. Since the video manuscripts of id_7, id_8 and id_9 do not have enough target quota values, they cannot be screened out as target content.
  • both the video manuscripts of id_3 and id_7 have at least one category of tags corresponding to a target quota value other than 0.
  • the target quota values corresponding to the two categories of tags in the video manuscript of id_7 are both is not 0, and the target Quota value corresponding to only one category of tags in the video manuscript of id_3 is not 0. Therefore, in order to obtain a better label distribution rate, the video manuscript of id_7 can be filtered out as the target content.
  • the video manuscript of id_3 since the video manuscript of id_3 has the highest score, the video manuscript of id_3 can be screened out as the target content.
  • the third calculation module is used to calculate the weight value corresponding to the label of each category in the content to be screened.
  • the screening module 54 is further configured to, when the number of target contents obtained by screening is less than a preset number, screen out the remaining contents to be screened in the content set that meet the fourth preset condition.
  • the target content of , wherein the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.
  • the present application when the content in the content set to be screened is screened, it only needs to perform traversal screening once for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal Therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
  • FIG. 6 schematically shows a schematic diagram of a hardware architecture of a computer device 6 suitable for implementing a content screening method according to an embodiment of the present application.
  • the computer device 6 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
  • it can be a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like.
  • the computer device 6 at least includes but is not limited to: a memory 120 , a processor 121 , and a network interface 122 that can communicate with each other through a system bus. in:
  • the memory 120 includes at least one type of computer-readable storage medium, wherein the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electronic Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 120 may be an internal storage module of the computer device 6 , such as a hard disk or memory of the computer device 6 .
  • the processor 121 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 121 is generally used to control the overall operation of the computer device 6 , such as performing control and processing related to data interaction or communication with the computer device 6 .
  • the processor 121 is configured to execute program codes or process data stored in the memory 120 .
  • FIG. 6 only shows a computer device having components 120-122, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead.
  • the content screening method stored in the memory 120 can be divided into one or more program modules and executed by one or more processors (the processor 121 in this embodiment) to complete the present application .
  • Embodiments of the present application provide a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
  • Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  • the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the computer-readable storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed over at least two network elements. Some or all of the modules may be screened out according to actual needs to achieve the purpose of the solutions of the embodiments of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.
  • each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware.
  • Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the program can be stored in a computer-readable storage medium. When the program is executed, it may include the flow of the embodiments of the above-mentioned methods.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

Abstract

A content screening method and device. The method comprises: obtaining a content set to be subjected to screening, the content set comprising a plurality of contents to be screened, each of said contents having identifier information, a label of at least one category, and a score, wherein the plurality of said contents are sorted in the content set in advance by means of the scores (S20); calculating, according to the label of each category in each of said contents and a weight value corresponding to the label of each category, a distribution proportion value of the label of each category contained in the content set (S21); calculating a target distribution proportion value of the label of each category according to each distribution proportion value and a preset label distribution proportion adjustment function (S22); and according to the target distribution proportion value of the label of each category and the weight value corresponding to the label of each category in each of said contents, sequentially screening out a target content meeting a first preset condition from the content set (S23). The method can save computing resources.

Description

内容筛选方法及装置Content screening method and device
本申请要求于2020年9月4日提交中国专利局、申请号为202010920038.6,发明名称为“内容筛选方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010920038.6 and the invention titled "Content Screening Method and Device" filed with the China Patent Office on September 4, 2020, the entire contents of which are incorporated into this application by reference.
技术领域technical field
本申请涉及计算机技术领域,特别是涉及一种内容筛选方法及装置。The present application relates to the field of computer technology, and in particular, to a content screening method and device.
背景技术Background technique
在各类不同场景的推荐系统中,通常都要经过用户画像查询,推荐内容检索召回,多轮排序筛选等过程,其中,在从推荐内容库召回大量的推荐内容后,直到筛选出若干的推荐内容最终推荐给用户,中间的排序筛选过程一般采用预先设定的筛选规则进行。然而,发明人发现,现有技术在采用预先设定的筛选规则进行推荐内容的筛选时,对于每一个待筛选内容一般需要进行嵌套遍历处理,导致在筛选过程中需要耗费大量的计算资源,以及需要消耗较多的时间才能筛选出目标推荐内容。In various recommendation systems of different scenarios, it is usually necessary to go through the process of user portrait query, recommended content retrieval and recall, and multiple rounds of sorting and screening. The content is finally recommended to the user, and the intermediate sorting and screening process is generally carried out using pre-set screening rules. However, the inventors found that when the prior art uses preset screening rules to screen recommended content, it is generally necessary to perform nested traversal processing for each content to be screened, which results in the consumption of a large amount of computing resources in the screening process. And it takes a lot of time to filter out the target recommended content.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请提供一种内容筛选方法、装置、计算机设备及计算机可读存储介质,以解决现有技术中在对推荐内容进行筛选时需要耗费大量的计算资源,以及需要较多时间的问题。In view of this, the present application provides a content screening method, device, computer equipment and computer-readable storage medium, so as to solve the problem that in the prior art, when screening recommended content, a large amount of computing resources are consumed, and a lot of time is required. question.
本申请提供了一种内容筛选方法,包括:The present application provides a content screening method, including:
获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
可选地,所述根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的 权重值计算所述内容集中包含的各个类别的标签的分布比重值包括:Optionally, the calculation of the distribution weight value of the labels of each category contained in the content set according to the label of each category in each content to be screened and the weight value corresponding to the label of each category includes:
获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签;Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;
将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
可选地,所述根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容包括:Optionally, according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each of the contents to be screened, the target content that meets the first preset condition is sequentially screened from the content set, including: :
按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作,其中,所述筛选处理操作包括:According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:
获取当前待筛选内容中的每一类别的标签对应的第一权重值;Obtain the first weight value corresponding to the label of each category in the current content to be screened;
判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;
若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
可选地,所述内容筛选方法还包括:Optionally, the content screening method further includes:
在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少一类别的标签对应的目标分布比重值不为零。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
可选地,所述内容筛选方法还包括:Optionally, the content screening method further includes:
在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
可选地,所述内容筛选方法还包括:Optionally, the content screening method further includes:
在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
可选地,所述根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值步骤之前,还包括:Optionally, before the step of calculating the distribution weight value of the labels of each category contained in the content set according to the labels of each category in the respective contents to be screened and the weight values corresponding to the labels of each category, the method further includes: :
计算各个待筛选内容中的每一类别的标签对应的权重值。Calculate the weight value corresponding to the label of each category in each content to be filtered.
本申请还提供了一种内容筛选装置,包括:The present application also provides a content screening device, comprising:
获取模块,用于获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所 述内容集中预先通过评分进行排序;an acquisition module, configured to acquire a content set to be screened, the content set includes a plurality of content to be screened, each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened The content is pre-sorted by scoring in the content set;
第一计算模块,用于根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;a first calculation module, configured to calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
第二计算模块,用于根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;The second calculation module is configured to calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function;
筛选模块,用于根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。The screening module is configured to sequentially screen out the target content that meets the first preset condition from the content set according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened.
本申请还提供了一种计算机设备,所述计算机设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:The present application also provides a computer device comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor executing the computer Implement the following steps when readable instructions:
获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
本申请还提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:The present application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
本申请实施例中,通过获取待筛选的内容集,所述内容集包括多个待筛选内容,每一 个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。在本申请实施例中,在对待筛选的内容集中的内容进行筛选时,对于每一个待筛选内容只需要进行一次遍历筛选就可以判断出当前待筛选内容是否为目标内容,而无需进行嵌套遍历,因此,本申请可以节省对待筛选内容进行筛选时所要耗费的计算资源,以及可以减少对待筛选内容进行筛选时所消耗的时间。In the embodiment of the present application, by acquiring a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated. distribution proportion value; calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function; according to the target distribution proportion value of each category of labels, each of The weight values corresponding to the labels of the categories are sequentially selected from the content set to select the target content that meets the first preset condition. In the embodiment of the present application, when the content in the content set to be screened is screened, it only needs to perform traversal screening once for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal Therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
附图说明Description of drawings
图1为本申请一实施例中待筛选内容进行筛选的示意图;1 is a schematic diagram of screening content to be screened in an embodiment of the present application;
图2为本申请所述的内容筛选方法的一种实施例的流程图;FIG. 2 is a flowchart of an embodiment of the content screening method described in this application;
图3为本申请根据各个待筛选内容中的每一类别的标签和每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值的步骤细化流程图;3 is a detailed flow chart of the steps of calculating the distribution weight value of the labels of each category contained in the content set according to the label of each category and the weight value corresponding to the label of each category in the content to be screened;
图4为本申请中各个类别的标签的目标分布比重值经过标签分布比重调整函数处理后的各个类别的标签的Quota值变化情况;Fig. 4 is the change situation of the quota value of the label of each class after the target distribution proportion value of the label of each class in the application is processed by the label distribution proportion adjustment function;
图5为本申请所述的内容筛选装置的一种实施例的程序模块图;FIG. 5 is a program module diagram of an embodiment of the content screening apparatus described in this application;
图6为本申请实施例提供的执行内容筛选方法的计算机设备的硬件结构示意图。FIG. 6 is a schematic diagram of a hardware structure of a computer device for executing a content screening method provided by an embodiment of the present application.
具体实施方式detailed description
以下结合附图与具体实施例进一步阐述本申请的优点。The advantages of the present application are further described below with reference to the accompanying drawings and specific embodiments.
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."
在本申请的描述中,需要理解的是,步骤前的数字标号并不标识执行步骤的前后顺序,仅用于方便描述本申请及区别每一步骤,因此不能理解为对本申请的限制。In the description of the present application, it should be understood that the numerical labels before the steps do not identify the order of execution of the steps, but are only used to facilitate the description of the present application and to distinguish each step, and therefore should not be construed as a limitation on the present application.
图1示意性示出了根据本申请实施例的待筛选内容进行筛选的示意图。在示例性的实施例中,在根据用户画像进行查询,匹配、排序等操作后从待推荐内容库(稿件库)中召回出5000稿件集。在得到5000稿件集之后,经过预设的第一筛选规则进行第一次筛选排序后得到2000稿件集,之后,经过预设的第二筛选规则进行第二次筛选排序后得到1000稿件集,最后,再经过若干轮的筛选排序后可以得到最终的推荐内容推荐给用户,其中,对稿件集的每一轮的筛选犹如一个漏斗对稿件集进行挑选和过滤,而筛选规则相当于对漏斗设置多大的漏斗过滤口。FIG. 1 schematically shows a schematic diagram of screening content to be screened according to an embodiment of the present application. In an exemplary embodiment, 5000 manuscript sets are recalled from the content library to be recommended (the manuscript library) after performing operations such as querying, matching, and sorting according to the user portrait. After obtaining 5,000 manuscript sets, 2,000 manuscript sets are obtained after the first screening and sorting by the preset first screening rules. After that, 1,000 manuscript sets are obtained after the second screening and sorting by the preset second screening rules. Finally, , and then after several rounds of screening and sorting, the final recommended content can be obtained and recommended to users. Among them, each round of screening of the manuscript set is like a funnel to select and filter the manuscript set, and the screening rules are equivalent to setting the size of the funnel. funnel filter.
参阅图2,其为本申请一实施例的内容筛选方法的流程示意图。本申请的内容筛选方法可以应用于上述图1中的每一个漏斗的内容筛选过程,可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备为执行主体进行示例性描述,从图中可以看出,本实施例中所提供的内容筛选方法包括:Referring to FIG. 2 , which is a schematic flowchart of a content screening method according to an embodiment of the present application. The content screening method of the present application can be applied to the content screening process of each funnel in the above-mentioned FIG. 1 . It can be understood that the flowchart in this embodiment of the method is not used to limit the sequence of execution steps. The following is an exemplary description with a computer device as the execution subject. As can be seen from the figure, the content screening method provided in this embodiment includes:
步骤S20、获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序。Step S20: Obtain a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the The content sets are pre-sorted by scoring.
具体地,所述内容集可以为根据用户画像和内容的特征从内容库中召回的内容,其中,召回指的是推荐系统的在线服务中从内容库中检索出具有一定相关度的大量内容的过程,这个过程使用的用户和内容的特征比较少,响应速度快。所述内容集也可以为对召回的内容经过一次或者多次筛选后得到的待筛选内容。Specifically, the content set may be the content recalled from the content library according to the user portrait and the characteristics of the content, wherein the recall refers to the retrieval of a large amount of content with a certain degree of relevance from the content library in the online service of the recommendation system. Process, this process uses less user and content features and responds faster. The content set may also be the content to be screened obtained after screening the recalled content one or more times.
在不同的推荐场景中,该内容集包含的多个待筛选内容不同。比如,在音视频推荐场景中,该内容集中包括多个待筛选的音视频文件;在新闻推荐场景中,该内容集中包括多个待筛选的新闻文章;在商品推荐场景中,该内容集中包括多个待筛选的商品。In different recommendation scenarios, the multiple contents to be screened contained in the content set are different. For example, in an audio and video recommendation scenario, the content set includes a plurality of audio and video files to be screened; in a news recommendation scenario, the content set includes a plurality of news articles to be screened; in a commodity recommendation scenario, the content set includes Multiple items to filter.
需要说明的是,为了便于对本申请进行描述,在本实施例以及下面的实施例中,所述待筛选内容以待筛选的视频稿件为例进行说明,其中,视频稿件指的是用户上传至平台中的视频文件。It should be noted that, in order to facilitate the description of this application, in this embodiment and the following embodiments, the content to be screened is described by taking the video manuscript to be screened as an example, wherein the video manuscript refers to the user uploading to the platform video files in .
在本实施例中,获取到的每一个待筛选的视频稿件都具有一个标识信息、至少一类别的标签、以及评分。In this embodiment, each acquired video manuscript to be screened has identification information, a label of at least one category, and a score.
其中,所述标识信息是用来唯一区分不同的视频稿件的ID(身份标识号)信息,不同的视频稿件具有不同的ID。The identification information is ID (identity identification number) information used to uniquely distinguish different video manuscripts, and different video manuscripts have different IDs.
每一个待筛选的视频稿件具有一个或者多个类别的标签,不同的待筛选的视频稿件具有的标签类别可以相同,也可以不同,此外,不同的视频稿件具有的标签数量可以相同,也可以不同。比如,视频稿件1具有标签tag_0,tag_1,视频稿件2具有标签tag_2,tag_3,视频稿件3具有标签tag_0,tag_2等。Each video manuscript to be screened has one or more categories of tags. Different video manuscripts to be screened may have the same or different tag categories. In addition, different video manuscripts may have the same or different number of tags. . For example, video manuscript 1 has tags tag_0, tag_1, video manuscript 2 has tags tag_2, tag_3, video manuscript 3 has tags tag_0, tag_2, and so on.
所述评分是通过评分模型来得到的,用于表示该待筛选的视频稿件与待推荐用户的相关性,一般来说,评分值越高,表示该待筛选的视频稿件与待推荐用户的相关性越高,评分值越低,表示,该待筛选的视频稿件与待推荐用户的相关性越低。The score is obtained through a scoring model, and is used to indicate the correlation between the video manuscript to be screened and the user to be recommended. Generally speaking, the higher the score value, the higher the correlation between the video manuscript to be screened and the user to be recommended. The higher the sex, the lower the scoring value, which means that the video manuscript to be screened has a lower correlation with the user to be recommended.
在本实施例中,为了便于后续对内容集中的多个待筛选的视频稿件进行筛选,可以预先对内容集中的多个待筛选的视频稿件按照评分大小进行排序,比如,按照评分从大到小的顺序进行排序,这样,当获取内容集时,即可以获取到按照评分从大到小进行排序后的多个待筛选的视频稿件。In this embodiment, in order to facilitate the subsequent screening of a plurality of video manuscripts to be screened in the content set, the plurality of video manuscripts to be screened in the content set may be sorted in advance according to the score, for example, according to the scores from large to small In this way, when acquiring a content set, you can acquire a plurality of video manuscripts to be screened in descending order of scores.
步骤S21,根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值。Step S21: Calculate the distribution weight value of the tags of each category included in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category.
具体地,每一个待筛选的视频稿件具有一个或者多个类别的标签,每一个待筛选的视频稿件中的所有类别的标签分配标签的总权重值(1),即每一个待筛选的视频稿件中的所有类别的标签的权重值相加等于1。在此,权重值为1仅为示例,每个待筛选的视频稿件的所有类别的标签的权重值相加可以为其他数值,该待筛选的视频稿件中的所有类别的标签的权重值相加等于该视频稿件的总权重值即可。Specifically, each video manuscript to be screened has one or more categories of tags, and the tags of all categories in each video manuscript to be screened are assigned a total weight value (1) of the tags, that is, each video manuscript to be screened The weights of all the labels in the category add up to 1. Here, the weight value of 1 is only an example, the weight value of the tags of all categories of each video manuscript to be screened can be added to other values, and the weight value of the tags of all categories in the video manuscript to be screened can be added. It can be equal to the total weight value of the video manuscript.
所述分布比重值(以下简称为“Quota值”)指的是内容集中的多个待筛选的视频稿件按照标签类别进行成分分解后各个类别的标签分布的比重情况,在一具体场景中,该标签分布的比重情况可以为当前类别的标签分配到的所有权重值的和。The distribution proportion value (hereinafter referred to as "Quota value") refers to the proportion of the label distribution of each category after a plurality of video manuscripts to be screened in the content set are decomposed according to the label category. The proportion of label distribution can be the sum of all weight values assigned to the current category of labels.
需要说明的是,本实施例中计算各个类别的标签的分布比重值的方式可以看作是对多个待筛选的视频稿件中大标签进行成分分解得到成分分解结果的过程。It should be noted that the method of calculating the distribution weight value of each category of tags in this embodiment can be regarded as a process of performing component decomposition on large tags in a plurality of video manuscripts to be screened to obtain a component decomposition result.
示例的,参照图3,所述根据各个待筛选内容中的每一类别的标签和每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值包括:Exemplarily, referring to FIG. 3 , the calculation of the distribution weight value of the tags of each category included in the content set according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened includes:
步骤S30,获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签。Step S30: Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags among all category tags included in the content set.
步骤S31,将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。In step S31, the sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
具体地,在计算各个类别的标签的分布比重值时,对于每一个类别的标签的分布比重值,都也可以先获取当前类别的标签在各个待筛选内容中的权重值,然后,将获取到的所有权重值的和作为当前类别的标签的分布比重值。Specifically, when calculating the distribution weight value of the labels of each category, for the distribution weight value of the labels of each category, the weight value of the label of the current category in each content to be screened can also be obtained first, and then the obtained The sum of all weight values is used as the distribution weight value of the current category label.
比如,当前类别的标签为标签a,在内容集中总共有视频稿件A、视频稿件B、视频稿件C具有标签a,且该标签a在视频稿件A、视频稿件B、视频稿件C中的权重值依次为0.4、0.6、0.8,则该标签a的分布比重值=0.4+0.6+0.8=1.8。同理,对于其他类别的标签,也可以采用上述类似的方法来计算其他类型的标签的分布比重值。For example, the label of the current category is label a, there are a total of video manuscript A, video manuscript B, and video manuscript C with label a in the content set, and the weight value of this label a in video manuscript A, video manuscript B, and video manuscript C The order is 0.4, 0.6, 0.8, then the distribution weight value of the label a=0.4+0.6+0.8=1.8. Similarly, for other types of labels, the above-mentioned similar method can also be used to calculate the distribution weight value of other types of labels.
本实施例中,通过将获取到的所有权重值的和作为所述当前类别的标签的分布比重值,可以方便且快速地得到各个类别的标签的分布比重值。In this embodiment, by taking the sum of all the obtained weight values as the distribution weight value of the label of the current category, the distribution weight value of the labels of each category can be obtained conveniently and quickly.
可以理解的是,当待筛选内容中的至少一类别的标签中携带有标签对应的权重值时,则为了可以根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值来计算所述内容集中包含的各个类别的标签的分布比重值,需要先计算各个待筛选内容中的每一类别的标签对应的权重值。It can be understood that, when at least one category of tags in the content to be screened carries the weight value corresponding to the tag, then in order to be able to correspond to the tags of each category according to the tags of each category in the contents to be screened. To calculate the distribution weight value of the tags of each category included in the content set, it is necessary to first calculate the weight value corresponding to the tag of each category in each content to be screened.
在一实施方式中,在计算待筛选的视频稿件中的每一类别的标签对应的权重值时,可以根据预设的权重分配规则进行计算,比如,该预设的权重分配规则为该待筛选的视频稿件中的所有类别的标签平分该待筛选的视频稿件的总权重1,则对于具有标签a与标签b的视频稿件A来说,可以计算出标签a在该视频稿件A中对应的权重值为1/2=0.5,可以计算出标签b在该视频稿件A中对应的权重值为1/2=0.5。同理,对于具有标签a与标签c的视频稿件B来说,可以计算出标签a在该视频稿件B中对应的权重值为1/2=0.5,可以计算出标签c在该视频稿件B中对应的权重值为1/2=0.5。In one embodiment, when calculating the weight value corresponding to the label of each category in the video manuscript to be screened, the calculation can be performed according to a preset weight distribution rule. For example, the preset weight distribution rule is The labels of all categories in the video manuscript equal to the total weight 1 of the video manuscript to be screened, then for the video manuscript A with the label a and the label b, the corresponding weight of the label a in the video manuscript A can be calculated. The value is 1/2=0.5, and the corresponding weight value of label b in the video manuscript A can be calculated as 1/2=0.5. Similarly, for a video manuscript B with a label a and a label c, it can be calculated that the corresponding weight value of the label a in the video manuscript B is 1/2=0.5, and it can be calculated that the label c is in the video manuscript B. The corresponding weight value is 1/2=0.5.
在另一实施方式中,在计算待筛选的视频稿件中的每一类别的标签对应的权重值时,也可以根据该视频稿件的内容进行分析来计算每一类别的标签对应的权重值,比如,视频稿件A具有“搞笑”和“音乐”两个标签,在对该视频稿件A进行分析后,发现该视频稿件A搞笑的元素占比80%,而音乐的元素则只占比20%,则在该视频稿件A进行分析后可以计算出该“搞笑”标签对应的权重值为0.8,该“音乐”标签对应的权重值占比0.2。In another embodiment, when calculating the weight value corresponding to the label of each category in the video manuscript to be screened, the weight value corresponding to the label of each category can also be calculated according to the content of the video manuscript, such as , the video manuscript A has two tags of "funny" and "music". After analyzing the video manuscript A, it is found that the funny elements of the video manuscript A account for 80%, while the music elements only account for 20%. Then, after analyzing the video manuscript A, it can be calculated that the weight value corresponding to the "funny" tag is 0.8, and the weight value corresponding to the "music" tag accounts for 0.2.
步骤S22,根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值。Step S22: Calculate the target distribution weight values of the labels of each category according to each distribution weight value and a preset label distribution weight adjustment function.
具体地,所述标签分布比重调整函数可以根据不同的业务场景来设定不同的函数,在具体设定函数时,以满足以下至少一个目标为准:Specifically, the label distribution weight adjustment function can be set with different functions according to different business scenarios. When the function is specifically set, at least one of the following objectives shall be satisfied:
目标一、经过该标签分布比重调整函数处理后得到的所有类别的标签的目标Quota值 的总和为N,其中,N为从内容集中筛选出的目标内容的数量。Objective 1. The sum of the target Quota values of all categories of labels obtained after processing by the label distribution proportion adjustment function is N, where N is the number of target contents screened out from the content set.
目标二、经过该标签分布比重调整函数处理后出现的标签类别应该尽量的多。Objective 2. The label categories that appear after processing by the label distribution weight adjustment function should be as many as possible.
目标三、经过该标签分布比重调整函数处理后得到的目标Quota值中不同类别的标签的Quota比例与原内容集尽量接近。Objective 3. The quota ratios of different categories of tags in the target quota value obtained after processing by the tag distribution proportion adjustment function are as close as possible to the original content set.
目标四、根据具体的应用场景筛选出不同倾向的细致处理,比如可以对所有标签的Quota值进行调和使其都接近于均值,或筛选出对Quota值过高的标签进行削峰,通过削峰等方式减下来的Quota值可以进入自由Quota池等。Goal 4. According to specific application scenarios, screen out the detailed processing of different tendencies. For example, the Quota values of all tags can be reconciled to make them close to the average value, or the tags with too high Quota values can be screened for peak clipping. The Quota value reduced by other methods can enter the free Quota pool and so on.
在一具体场景中,该标签分布比重调整函数为所有标签的Quota值对应减少2倍,经过该函数处理后的各个类别的标签的Quota值变化情况如图4所示。In a specific scenario, the label distribution proportion adjustment function is that the Quota values of all labels are correspondingly reduced by 2 times, and the changes of the Quota values of each category of labels processed by this function are shown in Figure 4.
步骤S23,根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Step S23 , according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each to-be-screened content, sequentially screen out the target content that meets the first preset condition from the content set.
具体地,所述第一预设条件为该待筛选的视频稿件的所有类别的标签均具有足够的Quota值。本实施例中,在根据各个类别的标签的目标分布比重值(目标Quota值)、各个待筛选的视频稿件中的每一类别的标签对应的权重值从所述内容集中筛选出符合第一预设条件的目标内容时,可以根据各个待筛选的视频稿件在内容集中的排序依次对每一个待筛选的视频稿件进行挑选判断,如果该待筛选的视频稿件的所有类别的标签均具有足够的Quota值,则可以将该待筛选的视频稿件从内容集中挑选出作为目标内容。在完成当前待筛选的视频稿件的挑选判断处理后,遍历出下一个视频稿件,然后对该视频稿件进行挑选判断,直到遍历完所有的视频稿件,并完成挑选判断后则结束筛选过程,或者直到筛选出预设数量的目标内容,则停止对视频稿件的筛选过程,其中,所述预设数量为预先需要从内容库中筛选出的目标内容的数量。Specifically, the first preset condition is that all categories of tags of the video manuscript to be screened have sufficient Quota values. In this embodiment, according to the target distribution weight value (target Quota value) of each category of tags, and the weight value corresponding to each category of tags in each video manuscript to be screened When setting the target content of the condition, each video manuscript to be screened can be selected and judged in turn according to the sorting of each video manuscript to be screened in the content set. If the labels of all categories of the video manuscript to be screened have enough Quota value, the video manuscript to be screened can be selected from the content set as the target content. After the selection and judgment processing of the video manuscript to be screened is completed, the next video manuscript is traversed, and then the video manuscript is selected and judged until all the video manuscripts are traversed, and the selection judgment is completed, the screening process is ended, or until When a preset number of target contents are screened out, the screening process of the video manuscript is stopped, wherein the preset number is the number of target contents that need to be screened out from the content library in advance.
需要说明的是,本实施例中筛选出目标内容的方式可以看作是对上述各个类别的标签进行成分分解之后再进行标签重新组合的过程。It should be noted that, the method of filtering out the target content in this embodiment can be regarded as a process of decomposing the tags of the above categories and then performing tag recombination.
在一示例性的实施方式中,所述根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容包括:In an exemplary embodiment, according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened, the content set that conforms to the first preset is sequentially screened out. The target content of the condition includes:
按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作。According to the order of each content to be screened in the content set, the screening processing operation is performed on each content to be screened in sequence.
具体地,在进行筛选处理操作时,需要按照各个待筛选的视频稿件在所述内容集中的排序依次进行筛选处理,比如,内容集中具有按照评分进行从高到低排序的5个视频稿件,分别为视频稿件A、视频稿件B、视频稿件C、视频稿件D及视频稿件E,则进行筛选处 理操作时,先对视频稿件A进行筛选处理,在完成视频稿件A的筛选处理后,继续对视频稿件B进行筛选处理,之后,依次对视频稿件C、视频稿件D及视频稿件E进行筛选处理。Specifically, when performing the screening processing operation, the screening processing needs to be performed in sequence according to the order of the video manuscripts to be screened in the content set. For video manuscript A, video manuscript B, video manuscript C, video manuscript D, and video manuscript E, when performing the screening operation, first screen the video manuscript A, and after completing the screening process of the video manuscript A, continue to screen the video manuscript A. The manuscript B is screened, and then the video manuscript C, the video manuscript D, and the video manuscript E are screened in sequence.
在本实施例中,所述筛选处理操作包括:获取当前待筛选内容中的每一类别的标签对应的第一权重值;判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。In this embodiment, the screening processing operation includes: obtaining a first weight value corresponding to the label of each category in the currently to-be-screened content; judging the first target corresponding to the category label in the current to-be-screened content Whether the distribution weight value is greater than or equal to the first weight value; if so, take the current content to be screened as the target content, and update the first target distribution weight value with the difference between the first weight value and the first weight value. A target distribution weight value.
具体而言,在当前的筛选处理操作为对视频稿件A进行筛选处理时,则可以先获取视频稿件A包含的标签a与标签b对应的第一权重值,假设分别为0.5与0.5,则在获取到标签a与标签b对应的第一权重值后,可以判断标签a对应的第一目标Quota值是否大于或者等于0.5,同时判断标签b对应的第一目标Quota值是否大于或者等于0.5,假设,标签a对应的第一目标Quota值与标签b对应的第一目标Quota值分别为4.0与3.5,则可以将该视频稿件A从内容集中筛选出来,作为目标内容,同时,会将所述第一目标分布比重值与所述第一权重值的差值来更新之前的第一目标分布比重值,即将差值:4.0-0.5=3.5更新为标签a对应的第一目标Quota值,将差值3.5-0.5=3.0更新为标签b对应的第一目标Quota值。Specifically, when the current screening operation is to screen the video manuscript A, the first weight values corresponding to the label a and the label b contained in the video manuscript A can be obtained first, assuming that they are 0.5 and 0.5 respectively, then in the After obtaining the first weight values corresponding to label a and label b, it can be determined whether the first target Quota value corresponding to label a is greater than or equal to 0.5, and at the same time, whether the first target Quota value corresponding to label b is greater than or equal to 0.5 can be determined. , the first target Quota value corresponding to the label a and the first target Quota value corresponding to the label b are 4.0 and 3.5 respectively, then the video manuscript A can be screened out from the content set as the target content, and at the same time, the video manuscript A can be screened out. The difference between a target distribution weight value and the first weight value is used to update the previous first target distribution weight value, that is, the difference value: 4.0-0.5=3.5 is updated to the first target Quota value corresponding to the label a, and the difference value is updated. 3.5-0.5=3.0 is updated to the first target Quota value corresponding to label b.
在完成视频稿件A的筛选处理操作后,继续按照上述方式依次对视频稿件B、视频稿件C、视频稿件D及视频稿件E进行筛选处理。After completing the screening processing operation of the video manuscript A, continue to perform the screening processing on the video manuscript B, the video manuscript C, the video manuscript D, and the video manuscript E in sequence according to the above method.
本申请实施例中,通过获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。在本申请实施例中,在对待筛选的内容集中的内容进行筛选时,对于每一个待筛选内容只需要进行一次遍历筛选就可以判断出当前待筛选内容是否为目标内容,而无需进行嵌套遍历,因此,本申请可以节省对待筛选内容进行筛选时所要耗费的计算资源,以及可以减少对待筛选内容进行筛选时所消耗的时间。In the embodiment of the present application, by acquiring a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated. distribution proportion value; calculate the target distribution proportion value of each category of labels according to each distribution proportion value and the preset label distribution proportion adjustment function; according to the target distribution proportion value of each category of labels, each of the content to be screened The weight values corresponding to the labels of the categories are sequentially selected from the content set to select the target content that meets the first preset condition. In the embodiment of the present application, when the content in the content set to be screened is screened, it is only necessary to perform one traversal screening for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal , therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
在一示例性的实施方式中,在筛选得到的目标内容的数量小于预设数量时,可以继续从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少一类别的标签对应的目标分布比重值不为零。In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the second preset condition may be continuously screened from the remaining contents to be screened in the content set, wherein, The second preset condition is that the target distribution proportion value corresponding to the tags of at least one category in the currently to-be-screened content is not zero.
具体地,所述预设数量为预先设定的需要从内容集中筛选出的目标内容的数量,比如,该内容集中具有10个视频稿件,而筛选得到的目标内容只有4个,则在此时,会从该内容集中剩余的6个视频稿件中筛选出视频稿件中的至少一类别的标签对应的目标Quota值不为零的视频稿件作为目标内容。Specifically, the preset number is the preset number of target contents that need to be screened out from the content set. For example, there are 10 video manuscripts in the content set, but only 4 target contents are screened, then at this time , a video manuscript whose target Quota value corresponding to at least one category of tags in the video manuscript is not zero will be selected as the target content from the remaining 6 video manuscripts in the content set.
示例性的,假设剩余的6个视频稿件按照评分从大到小排序分别为视频稿件1、视频稿件2、视频稿件3、视频稿件4、视频稿件5及视频稿件6。视频稿件1中的标签a对应的目标Quota值为0.2。视频稿件2中的标签b对应的目标Quota值为0.3。视频稿件3中的标签b对应的目标Quota值为0.4。视频稿件4、视频稿件5及视频稿件6中的所有类别的标签对应的标Quota值均为0,则在进行筛选处理操作时,可以将视频稿件1、视频稿件2、视频稿件3都作为目标内容。当然,若当前只需要再筛选一个视频稿件作为目标内容,则可以只将评分最大的视频稿件1作为目标内容;若当前只需要再筛选两个视频稿件作为目标内容,则罗将评分靠前的视频稿件1和视频稿件2作为目标内容。Exemplarily, it is assumed that the remaining 6 video manuscripts are sorted in descending order of scores as video manuscript 1, video manuscript 2, video manuscript 3, video manuscript 4, video manuscript 5, and video manuscript 6. The target Quota value corresponding to label a in video manuscript 1 is 0.2. The target Quota value corresponding to label b in video manuscript 2 is 0.3. The target Quota value corresponding to label b in video manuscript 3 is 0.4. If all categories of labels in Video Contribution 4, Video Contribution 5 and Video Contribution 6 have a corresponding Quota value of 0, then when the screening operation is performed, Video Contribution 1, Video Contribution 2, and Video Contribution 3 can all be used as targets. content. Of course, if only one more video manuscript needs to be screened as the target content at present, then only the video manuscript 1 with the highest score can be used as the target content; if only two more video manuscripts need to be screened as the target content, Luo will select the one with the highest score as the target content. Video Contribution 1 and Video Contribution 2 serve as target content.
本实施例通过在未筛选得到预设数量的目标内容时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,提高了内容筛选的标签覆盖率(出现在筛选结果集中的标签数目与原内容集的总标签数目的比率)。In this embodiment, the target content that meets the second preset condition is selected from the remaining content to be screened in the content set when the preset number of target content is not obtained after screening, thereby improving the label coverage of content screening (appearing in The ratio of the number of tags in the filtered result set to the total number of tags in the original content set).
在一示例性的实施方式中,在筛选得到的目标内容的数量小于预设数量时,也可以继续从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the third preset condition may also continue to be screened from the remaining contents to be screened in the content set, wherein , and the third preset condition is that the current content to be screened has a preset mark.
具体地,所述预设标记为用于标记待筛选的视频稿件为低质量的视频稿件的标记,其中当该视频稿件的标签与其他高质量的视频稿件的标签的关联性较差时,可以将这类的视频稿件标记为低质量的视频稿件。Specifically, the preset mark is a mark used to mark the video manuscript to be screened as a low-quality video manuscript. When the correlation between the label of the video manuscript and the tags of other high-quality video manuscripts is poor, the Flag this type of video feed as low-quality video feed.
在本实施例中,通过选择低质量的视频稿件作为目标内容,可以提高筛选的内容的多样性。In this embodiment, by selecting low-quality video manuscripts as the target content, the diversity of the content to be screened can be improved.
在一示例性的实施方式中,在筛选得到的目标内容的数量小于预设数量时,也可以继续从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the fourth preset condition may also be continuously screened from the remaining contents to be screened in the content set, Wherein, the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.
具体地,在筛选得到的目标内容的数量小于预设数量时,可以按照评分的大小从大到小的顺序从剩余的待筛选的视频稿件中筛选出目标内容。剩余的待筛选的视频稿件以上述稿件1-稿件6为例,则在还需要筛选出1个视频稿件作为目标内容时,则可以将视频稿件1筛选出来作为目标内容,同理,若还需要筛选出1个视频稿件作为目标内容时,则可以将视频稿件1和视频稿件2筛选出来作为目标内容。Specifically, when the number of target contents obtained by screening is less than the preset amount, the target contents may be screened out from the remaining video manuscripts to be screened in descending order of the scores. The remaining video manuscripts to be screened take the above-mentioned manuscripts 1 to 6 as examples. When 1 video manuscript needs to be screened out as the target content, then the video manuscript 1 can be screened out as the target content. When 1 video manuscript is screened out as the target content, then the video manuscript 1 and the video manuscript 2 can be screened out as the target content.
在本实施例中,通过选择评分更大的视频稿件作为目标内容,可以提高得分优先率(在当前筛选前一轮的处理得分/排序靠前的稿件,进入筛选结果的比率)。In this embodiment, by selecting a video manuscript with a higher score as the target content, the scoring priority ratio (the ratio of processing scores in the previous round of current screening/top-ranked manuscripts to entering screening results) can be improved.
在一示例性的实施方式中,在筛选得到的目标内容的数量小于预设数量时,也可以继续从所述内容集的剩余的待筛选内容中筛选出符合第五预设条件的目标内容,其中,所述第五预设条件为当前待筛选内容A的所有类别的标签对应的目标分布比重值均为零但当前待筛选内容A中每一类别的标签(假设包括标签a与标签b)的总数量均没有超过预设阈值,比如,该预设阈值为5,所有筛选出的目标内容包括的标签a的数量为4,且包含的标签b的s数量为3,则可以将该当前待筛选内容A作为目标内容;若所有筛选出的目标内容包括的标签a的数量为5,且包含的标签b的数量为6,则不能将该当前待筛选内容A作为目标内容。In an exemplary embodiment, when the number of target contents obtained by screening is less than the preset number, the target content that meets the fifth preset condition may also be continuously screened from the remaining contents to be screened in the content set, Wherein, the fifth preset condition is that the target distribution proportion values corresponding to the labels of all categories of the content A currently to be screened are zero, but the labels of each category in the current content A to be screened (assuming that the labels a and b are included) The total number of tags does not exceed the preset threshold. For example, if the preset threshold is 5, the number of tags a included in all the filtered target content is 4, and the number of tags b included is 3, then the current The content A to be screened is used as the target content; if the number of tags a included in all the screened target content is 5, and the number of tags b included is 6, the current content A to be screened cannot be used as the target content.
示例性的,为了便于理解本申请的技术方案,以下结合一具体应用场景来描述本申请的技术方案。Exemplarily, in order to facilitate understanding of the technical solutions of the present application, the technical solutions of the present application are described below with reference to a specific application scenario.
假设需要从10个视频稿件中筛选出5个视频稿件作为目标内容,且按照评分(Score)从大到小进行排列后的10个视频稿件的详情如下表所示:Suppose that 5 video manuscripts need to be selected from 10 video manuscripts as the target content, and the details of the 10 video manuscripts after ranking from large to small according to the score (Score) are shown in the following table:
Id(标识信息)Id (identification information) Tags(标签)Tags Score(评分)Score
id_0id_0 tag_0,tag_1tag_0,tag_1 0.950.95
id_1id_1 tag_2,tag_3tag_2,tag_3 0.90.9
id_2id_2 tag_4,tag_5tag_4,tag_5 0.850.85
id_3id_3 tag_0,tag_2tag_0,tag_2 0.80.8
id_4id_4 tag_1,tag_4tag_1,tag_4 0.750.75
id_5id_5 tag_3,tag_5tag_3,tag_5 0.70.7
id_6id_6 tag_0,tag_6tag_0,tag_6 0.650.65
id_7id_7 tag_0,tag_7tag_0,tag_7 0.60.6
id_8id_8 tag_0,tag_6tag_0,tag_6 0.550.55
id_9id_9 tag_6,tag_7tag_6,tag_7 0.50.5
若该10个视频稿件中的每一个类别的标签都评分该视频稿件的总权重值1,即该10个视频稿件中的每一个类别的标签的权重值都为0.5,则根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值可以计算得到如下表所示的各个类别的标签的Quota值:If the tags of each category in the 10 video manuscripts are rated with a total weight value of 1, that is, the weight value of the tags of each category in the 10 video manuscripts is 0.5, then according to each content to be screened The weight value corresponding to the label of each category and the label of each category can be calculated to obtain the Quota value of the label of each category as shown in the following table:
TagTag QuotaQuota
tag_0tag_0 2.52.5
tag_1tag_1 11
tag_2tag_2 11
tag_3tag_3 11
tag_4tag_4 11
tag_5tag_5 11
tag_6tag_6 1.51.5
tag_7tag_7 11
在得到各个类别的标签的Quota值之后,假设通过标签分布比重调整函数对各个Quota值作等比例缩小2倍处理,则可以得到如下表所示的各个目标Quota值:After obtaining the Quota values of the labels of each category, assuming that each Quota value is proportionally reduced by 2 times through the label distribution weight adjustment function, each target Quota value as shown in the following table can be obtained:
TagTag 目标QuotaTarget Quota
tag_0tag_0 1.251.25
tag_1tag_1 0.50.5
tag_2tag_2 0.50.5
tag_3tag_3 0.50.5
tag_4tag_4 0.50.5
tag_5tag_5 0.50.5
tag_6tag_6 0.750.75
tag_7tag_7 0.50.5
在得到各个目标Quota值之后,可以按照各个视频稿件在该10个视频稿件中的排序依次进行筛选处理操作。首先,对于id_0的视频稿件,由于该视频稿件包括权重值为0.5的tag_0和权重值为0.5的tag_1,而当前tag_0和tag_1对应的目标Quota值都大于0.5,因此,可以将标识信息为id_0的视频稿件筛选出作为目标内容,并将id_0中的tag_0对应的目标Quota值1.25与tag_0对应的权重值0.5的差值1.25-0.5=0.75作为更新的tag_0的目标Quota值,同理,将id_0中的tag_1对应的目标Quota值0.5与tag_1对应的权重值0.5的差值0.5-0.5=0作为更新的tag_1的目标Quota值,更新后可以得到如下表所示的各个目标Quota值:After each target Quota value is obtained, the filtering operation can be performed in sequence according to the order of each video manuscript in the 10 video manuscripts. First of all, for the video manuscript of id_0, since the video manuscript includes tag_0 with a weight value of 0.5 and tag_1 with a weight value of 0.5, and the target Quota values corresponding to the current tag_0 and tag_1 are both greater than 0.5, therefore, the identification information can be identified as id_0. The video manuscript is screened out as the target content, and the difference 1.25-0.5=0.75 between the target Quota value 1.25 corresponding to tag_0 in id_0 and the weight value 0.5 corresponding to tag_0 is used as the target Quota value of the updated tag_0. The difference between the target Quota value 0.5 corresponding to tag_1 and the weight value 0.5 corresponding to tag_1 is 0.5-0.5=0 as the target Quota value of the updated tag_1. After the update, each target Quota value as shown in the following table can be obtained:
TagTag 目标QuotaTarget Quota
tag_0tag_0 0.750.75
tag_1 tag_1 00
tag_2tag_2 0.50.5
tag_3tag_3 0.50.5
tag_4tag_4 0.50.5
tag_5tag_5 0.50.5
tag_6tag_6 0.750.75
tag_7tag_7 0.50.5
同理,可以把id_1和id_2的视频稿件筛选出作为目标内容,在对id_1和id_2的视频稿件进行筛选处理操作后,可以得到如下表所示的各个目标Quota值:In the same way, the video manuscripts of id_1 and id_2 can be screened out as the target content. After screening the video manuscripts of id_1 and id_2, the target Quota values shown in the following table can be obtained:
TagTag 目标QuotaTarget Quota
tag_0tag_0 0.750.75
tag_1 tag_1 00
tag_2 tag_2 00
tag_3 tag_3 00
tag_4 tag_4 00
tag_5 tag_5 00
tag_6tag_6 0.750.75
tag_7tag_7 0.50.5
接着,对id_3的视频稿件进行筛选处理操作,由于该视频稿件包括权重值为0.5的tag_0和权重值为0.5的tag_2,而当前tag_2对应的目标Quota值为0,小于0.5,因此,不可以将标识信息为id_3的视频稿件筛选出作为目标内容。同理id_4和id_5的视频稿件也因没有足够的目标quota值而不能筛选出作为目标内容。Next, screen the video manuscript of id_3. Since the video manuscript includes tag_0 with a weight value of 0.5 and tag_2 with a weight value of 0.5, and the target Quota value corresponding to the current tag_2 is 0, which is less than 0.5, it is not possible to use Video manuscripts whose identification information is id_3 are screened out as target content. Similarly, the video manuscripts of id_4 and id_5 cannot be screened out as target content because they do not have enough target quota values.
之后,对id_6的视频稿件进行筛选处理操作,由于该视频稿件包括权重值为0.5的tag_0和权重值为0.5的tag_6,而当前tag_0和tag_6对应的目标Quota值都大于0.5,因此,可以将标识信息为id_6的视频稿件筛选出作为目标内容,并将id_6中的tag_0对应的目标Quota值0.75与tag_0对应的权重值0.5的差值0.75-0.5=0.25作为更新的tag_0的目标Quota值,同理,将id_6中的tag_6对应的目标Quota值0.75与tag_6对应的权重值0.5的差值0.75-0.5=0.25作为更新的tag_6的目标Quota值,更新后可以得到如下表所示的各个目标Quota值:After that, the video manuscript of id_6 is screened. Since the video manuscript includes tag_0 with a weight value of 0.5 and tag_6 with a weight value of 0.5, and the target Quota values corresponding to the current tag_0 and tag_6 are both greater than 0.5, the identification The video manuscript whose information is id_6 is screened out as the target content, and the difference 0.75-0.5=0.25 between the target Quota value 0.75 corresponding to tag_0 in id_6 and the weight value 0.5 corresponding to tag_0 is used as the target Quota value of the updated tag_0, in the same way , the difference between the target Quota value 0.75 corresponding to tag_6 in id_6 and the weight value 0.5 corresponding to tag_6 is 0.75-0.5=0.25 as the target Quota value of the updated tag_6. After the update, each target Quota value as shown in the following table can be obtained:
TagTag 目标QuotaTarget Quota
tag_0tag_0 0.250.25
tag_1 tag_1 00
tag_2 tag_2 00
tag_3 tag_3 00
tag_4 tag_4 00
tag_5 tag_5 00
tag_6tag_6 0.250.25
tag_7tag_7 0.50.5
最后,依次对id_7,id_8和id_9的视频稿件进行筛选处理操作,由于id_7,id_8和id_9的视频稿件都没有足够的目标quota值而不能筛选出作为目标内容。Finally, the video manuscripts of id_7, id_8 and id_9 are screened in turn. Since the video manuscripts of id_7, id_8 and id_9 do not have enough target quota values, they cannot be screened out as target content.
由于在完成所有的视频稿件的筛选操作之后,只有{id_0,id_1,id_2,id_6}的视频稿件被筛选出来作为目标内容,而我们的筛选目标为5个视频稿件,因此,在一实施方式中,可以进一步从剩余的id_3,id_4和id_5,id_7,id_8和id_9的视频稿件中筛选出视频稿件中的至少一个类别的标签对应的目标Quota值不为0的视频稿件作为目标内容。在本实施例中,id_3和id_7的视频稿件中的都存在至少一个类别的标签对应的目标Quota值不为0,然而,由于id_7的视频稿件中的两个类别的标签对应的目标Quota值都不为0,而id_3的视频稿件中的只有一个类别的标签对应的目标Quota值都不为0,因此,为了得到更好的标签分布率,可以将id_7的视频稿件筛选出作为目标内容。Since after completing the screening of all video manuscripts, only the video manuscripts of {id_0, id_1, id_2, id_6} are selected as the target content, and our screening target is 5 video manuscripts, therefore, in one embodiment , the video manuscripts whose target Quota value corresponding to the label of at least one category in the video manuscripts corresponding to the target Quota value is not 0 can be further screened from the remaining id_3, id_4 and id_5, id_7, id_8 and id_9 video manuscripts as the target content. In this embodiment, both the video manuscripts of id_3 and id_7 have at least one category of tags corresponding to a target quota value other than 0. However, since the target quota values corresponding to the two categories of tags in the video manuscript of id_7 are both is not 0, and the target Quota value corresponding to only one category of tags in the video manuscript of id_3 is not 0. Therefore, in order to obtain a better label distribution rate, the video manuscript of id_7 can be filtered out as the target content.
在另一实施方式中,也可以进一步从剩余的id_3,id_4和id_5,id_7,id_8和id_9的视频稿件中筛选出视频稿件中的视频稿件中的评分大于其他待筛选内容的评分的视频稿件作为目标内容,在本实施例中,由于id_3的视频稿件的评分最大,因此,可以将id_3的视频稿件筛选出作为目标内容。In another embodiment, it is also possible to further screen out the video manuscripts whose scores in the video manuscripts are greater than the scores of other content to be screened from the remaining video manuscripts of id_3, id_4, id_5, id_7, id_8 and id_9 as the video manuscripts. For the target content, in this embodiment, since the video manuscript of id_3 has the highest score, the video manuscript of id_3 can be screened out as the target content.
参阅图5所示,是本申请内容筛选装置50一实施例的程序模块图。Referring to FIG. 5 , it is a program module diagram of an embodiment of the content screening apparatus 50 of the present application.
本实施例中,所述内容筛选装置50包括一系列的存储于存储器上的计算机可读指令,当该计算机可读指令被处理器执行时,可以实现本申请各实施例的内容筛选功能。在一些实施例中,基于该计算机可读指令各部分所实现的特定的操作,内容筛选装置50可以被划分为一个或多个模块。例如,在图5中,所述内容筛选装置50可以被分割成获取模块51、第一计算模块52、第二计算模块53、筛选模块54。其中:In this embodiment, the content screening apparatus 50 includes a series of computer-readable instructions stored in the memory, and when the computer-readable instructions are executed by the processor, the content screening function of each embodiment of the present application can be implemented. In some embodiments, the content screening apparatus 50 may be divided into one or more modules based on the specific operations implemented by the various portions of the computer readable instructions. For example, in FIG. 5 , the content screening apparatus 50 may be divided into an acquisition module 51 , a first calculation module 52 , a second calculation module 53 , and a screening module 54 . in:
获取模块51,用于获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序。The obtaining module 51 is configured to obtain a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened Filtered content is pre-ordered by scoring in the content set.
第一计算模块52,用于根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值。The first calculation module 52 is configured to calculate the distribution weight value of the tags of each category included in the content set according to the tags of each category in the respective contents to be screened and the weight values corresponding to the tags of each category.
在一示例性的实施方式中,第一计算模块52,还用于获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签;及将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。In an exemplary embodiment, the first calculation module 52 is further configured to obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is among all the category labels included in the content set. and use the sum of all obtained weight values as the distribution weight value of the current category of labels.
第二计算模块53,用于根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值。The second calculation module 53 is configured to calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function.
筛选模块54,用于根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。The screening module 54 is configured to screen out the target content that meets the first preset condition from the content set in turn according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each to-be-screened content.
在一示例性的实施方式中,筛选模块54,还用于按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作,其中,所述筛选处理操作包括:获取当前待筛选内容中的每一类别的标签对应的第一权重值;判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。In an exemplary embodiment, the screening module 54 is further configured to perform a screening processing operation on each to-be-screened content in sequence according to the order of each to-be-screened content in the content set, wherein the screening processing operation includes: obtaining the current the first weight value corresponding to the label of each category in the content to be screened; determine whether the first target distribution proportion value corresponding to the category label in the current content to be screened is greater than or equal to the first weight value; , the current content to be screened is taken as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
在一示例性的实施方式中,内容筛选装置50还包括第三计算模块。In an exemplary embodiment, the content screening apparatus 50 further includes a third computing module.
所述第三计算模块,用于计算各个待筛选内容中的每一类别的标签对应的权重值。The third calculation module is used to calculate the weight value corresponding to the label of each category in the content to be screened.
在一示例性的实施方式中,筛选模块54,还用于在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少一类别的标签对应的目标分布比重值不为零。In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than the preset number, screen out the remaining contents to be screened in the content set that meet the second preset condition. The target content, wherein the second preset condition is that the target distribution proportion value corresponding to the label of at least one category in the currently to-be-screened content is not zero.
在一示例性的实施方式中,筛选模块54,还用于在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than the preset number, screen out the content that meets the third preset condition from the remaining contents to be screened in the content set. The target content, wherein the third preset condition is that the current content to be screened has a preset mark.
在一示例性的实施方式中,筛选模块54,还用于在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。In an exemplary embodiment, the screening module 54 is further configured to, when the number of target contents obtained by screening is less than a preset number, screen out the remaining contents to be screened in the content set that meet the fourth preset condition. The target content of , wherein the fourth preset condition is that the score of the current content to be screened is greater than the scores of other content to be screened.
本申请实施例中,通过获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对 应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。在本申请实施例中,在对待筛选的内容集中的内容进行筛选时,对于每一个待筛选内容只需要进行一次遍历筛选就可以判断出当前待筛选内容是否为目标内容,而无需进行嵌套遍历,因此,本申请可以节省对待筛选内容进行筛选时所要耗费的计算资源,以及可以减少对待筛选内容进行筛选时所消耗的时间。In the embodiment of the present application, by acquiring a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of The content to be screened is sorted in advance by scoring in the content set; according to the label of each category in the content to be screened and the weight value corresponding to the label of each category, the value of the label of each category contained in the content set is calculated. distribution proportion value; calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function; according to the target distribution proportion value of each category of labels, each of The weight values corresponding to the labels of the categories are sequentially selected from the content set to select the target content that meets the first preset condition. In the embodiment of the present application, when the content in the content set to be screened is screened, it only needs to perform traversal screening once for each to-be-screened content to determine whether the current to-be-screened content is the target content, without the need for nested traversal Therefore, the present application can save the computing resources consumed when screening the content to be screened, and can reduce the time consumed when screening the content to be screened.
图6示意性示出了根据本申请实施例的适于实现内容筛选方法的计算机设备6的硬件架构示意图。本实施例中,计算机设备6是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图6所示,计算机设备6至少包括但不限于:可通过系统总线相互通信链接存储器120、处理器121、网络接口122。其中:FIG. 6 schematically shows a schematic diagram of a hardware architecture of a computer device 6 suitable for implementing a content screening method according to an embodiment of the present application. In this embodiment, the computer device 6 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. For example, it can be a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like. As shown in FIG. 6 , the computer device 6 at least includes but is not limited to: a memory 120 , a processor 121 , and a network interface 122 that can communicate with each other through a system bus. in:
存储器120至少包括一种类型的计算机可读存储介质,其中,该计算机可读存储介质可以是易失性的,也可以是非失性的。该计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器120可以是计算机设备6的内部存储模块,例如该计算机设备6的硬盘或内存。在另一些实施例中,存储器120也可以是计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,存储器120还可以既包括计算机设备6的内部存储模块也包括其外部存储设备。本实施例中,存储器120通常用于存储安装于计算机设备6的操作系统和各类应用软件,例如内容筛选方法的程序代码等。此外,存储器120还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 120 includes at least one type of computer-readable storage medium, wherein the computer-readable storage medium may be volatile or non-volatile. The computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electronic Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 120 may be an internal storage module of the computer device 6 , such as a hard disk or memory of the computer device 6 . In other embodiments, the memory 120 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC for short), a secure digital (Secure) Digital, referred to as SD) card, flash memory card (Flash Card) and so on. Of course, the memory 120 may also include both an internal storage module of the computer device 6 and an external storage device thereof. In this embodiment, the memory 120 is generally used to store the operating system installed in the computer device 6 and various application software, such as program codes of the content screening method, and the like. In addition, the memory 120 may also be used to temporarily store various types of data that have been output or will be output.
处理器121在一些实施例中可以是中央处理器(Central Processing Unit,简称为CPU)、控制器、微控制器、微处理器、或其它数据处理芯片。该处理器121通常用于控制计算机设备6的总体操作,例如执行与计算机设备6进行数据交互或者通信相关的控制和处理等。本实施例中,处理器121用于运行存储器120中存储的程序代码或者处理数据。In some embodiments, the processor 121 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 121 is generally used to control the overall operation of the computer device 6 , such as performing control and processing related to data interaction or communication with the computer device 6 . In this embodiment, the processor 121 is configured to execute program codes or process data stored in the memory 120 .
网络接口122可包括无线网络接口或有线网络接口,该网络接口122通常用于在计算机设备6与其它计算机设备之间建立通信链接。例如,网络接口122用于通过网络将计算机设备6与外部终端相连,在计算机设备6与外部终端之间的建立数据传输通道和通信链接等。网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System  of Mobile communication,简称为GSM)、宽带码分多址(Wideband Code Division Multiple Access,简称为WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 122, which may include a wireless network interface or a wired network interface, is typically used to establish a communication link between the computer device 6 and other computer devices. For example, the network interface 122 is used to connect the computer device 6 with an external terminal through a network, and establish a data transmission channel and a communication link between the computer device 6 and the external terminal. The network can be Intranet, Internet, Global System of Mobile communication (GSM for short), Wideband Code Division Multiple Access (WCDMA for short), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
需要指出的是,图6仅示出了具有部件120~122的计算机设备,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。It should be noted that FIG. 6 only shows a computer device having components 120-122, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead.
在本实施例中,存储于存储器120中的内容筛选方法可以被分割为一个或者多个程序模块,并由一个或多个处理器(本实施例为处理器121)所执行,以完成本申请。In this embodiment, the content screening method stored in the memory 120 can be divided into one or more program modules and executed by one or more processors (the processor 121 in this embodiment) to complete the present application .
本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质其上存储有计算机可读指令,计算机可读指令被处理器执行时实现以下步骤:Embodiments of the present application provide a computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
本实施例中,计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,计算机可读存储介质可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,计算机可读存储介质也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,计算机可读存储介质还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,计算机可读存储介质通常用于存储安装于计算机设备的操作系统和各类应用软件,例如实施例中的内容筛选方法的程序代码等。此外,计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory ( ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read-Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the computer-readable storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device. In other embodiments, the computer-readable storage medium may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC for short), a secure digital ( Secure Digital, referred to as SD) card, flash memory card (Flash Card) and so on. Of course, the computer-readable storage medium may also include both an internal storage unit of a computer device and an external storage device thereof. In this embodiment, the computer-readable storage medium is generally used to store the operating system and various application software installed in the computer device, for example, the program code of the content screening method in the embodiment. In addition, the computer-readable storage medium can also be used to temporarily store various types of data that have been output or will be output.
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以 位于一个地方,或者也可以分布到至少两个网络单元上。可以根据实际的需要筛选出其中的部分或者全部模块来实现本申请实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place , or distributed over at least two network elements. Some or all of the modules may be screened out according to actual needs to achieve the purpose of the solutions of the embodiments of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.
通过以上的实施方式的描述,本领域普通技术人员可以清楚地了解到各实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机可读指令来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-OnlyMemory,ROM)或随机存储记忆体(RandomAccessMemory,RAM)等。From the description of the above embodiments, those of ordinary skill in the art can clearly understand that each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware. Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the program can be stored in a computer-readable storage medium. When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. Scope.

Claims (20)

  1. 一种内容筛选方法,包括:A content screening method that includes:
    获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
    根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
    根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
    根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  2. 根据权利要求1所述的内容筛选方法,所述根据各个待筛选内容中的每一类别的标签和每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值包括:The content screening method according to claim 1, wherein the distribution weight value of the tags of each category contained in the content set is calculated according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened include:
    获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签;Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;
    将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
  3. 根据权利要求1或2所述的内容筛选方法,所述根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容包括:The content screening method according to claim 1 or 2, wherein according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened The target content of the first preset condition includes:
    按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作,其中,所述筛选处理操作包括:According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:
    获取当前待筛选内容中的每一类别的标签对应的第一权重值;Obtain the first weight value corresponding to the label of each category in the current content to be screened;
    判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;Judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;
    若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
  4. 根据权利要求3所述的内容筛选方法,所述内容筛选方法还包括:The content screening method according to claim 3, further comprising:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少 一类别的标签对应的目标分布比重值不为零。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
  5. 根据权利要求3所述的内容筛选方法,所述内容筛选方法还包括:The content screening method according to claim 3, further comprising:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
  6. 根据权利要求3所述的内容筛选方法,所述内容筛选方法还包括:The content screening method according to claim 3, further comprising:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
  7. 根据权利要求1、2、4、5或6所述的内容筛选方法,所述根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值步骤之前,还包括:The content screening method according to claim 1, 2, 4, 5 or 6, wherein the content set is calculated according to a label of each category in the respective contents to be screened and a weight value corresponding to the label of each category Before the step of including the distribution weights of the labels of each category, also include:
    计算各个待筛选内容中的每一类别的标签对应的权重值。Calculate the weight value corresponding to the label of each category in each content to be filtered.
  8. 一种内容筛选装置,包括:A content screening device, comprising:
    获取模块,用于获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;an acquisition module, configured to acquire a content set to be screened, the content set includes a plurality of content to be screened, each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened The content is pre-sorted by scoring in the content set;
    第一计算模块,用于根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;a first calculation module, configured to calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
    第二计算模块,用于根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;The second calculation module is configured to calculate the target distribution proportion value of each category of labels according to each distribution proportion value and a preset label distribution proportion adjustment function;
    筛选模块,用于根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。The screening module is configured to sequentially screen out the target content that meets the first preset condition from the content set according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened.
  9. 一种计算机设备,所述计算机设备,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored on the memory and executable on the processor, the processor implementing the computer-readable instructions when executed The following steps:
    获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
    根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
    根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
    根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  10. 根据权利要求9所述的计算机设备,所述根据各个待筛选内容中的每一类别的标签和每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值包括:The computer device according to claim 9, wherein calculating the distribution weight value of the tags of each category included in the content set according to the tags of each category and the weight values corresponding to the tags of each category in the contents to be screened comprises the following steps: :
    获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签;Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;
    将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
  11. 根据权利要求9或10所述的计算机设备,所述根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容包括:The computer device according to claim 9 or 10, wherein according to the target distribution weight value of each category of tags, and the weight value corresponding to each category of tags in each content to be screened, the content set that meets the requirements of the first The target content of a preset condition includes:
    按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作,其中,所述筛选处理操作包括:According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:
    获取当前待筛选内容中的每一类别的标签对应的第一权重值;Obtain the first weight value corresponding to the label of each category in the current content to be screened;
    判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;
    若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
  12. 根据权利要求11所述的计算机设备,所述处理器执行所述计算机可读指令时还实现以下步骤:The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少一类别的标签对应的目标分布比重值不为零。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
  13. 根据权利要求11所述的计算机设备,所述处理器执行所述计算机可读指令时还实现以下步骤:The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
  14. 根据权利要求11所述的计算机设备,所述处理器执行所述计算机可读指令时还实 现以下步骤:The computer device of claim 11, wherein the processor further implements the following steps when executing the computer-readable instructions:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
  15. 一种计算机可读存储介质,其上存储有计算机可读指令,其特征在于:所述计算机可读指令被处理器执行时实现以下步骤:A computer-readable storage medium on which computer-readable instructions are stored, characterized in that: when the computer-readable instructions are executed by a processor, the following steps are implemented:
    获取待筛选的内容集,所述内容集包括多个待筛选内容,每一个待筛选内容都具有标识信息、至少一类别的标签、以及评分,其中,所述多个待筛选内容在所述内容集中预先通过评分进行排序;Acquire a content set to be screened, the content set includes a plurality of content to be screened, and each content to be screened has identification information, a label of at least one category, and a score, wherein the plurality of content to be screened is in the content Centrally pre-sort by score;
    根据各个待筛选内容中的每一类别的标签和所述每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值;Calculate the distribution weight value of the tags of each category contained in the content set according to the tags of each category in the contents to be screened and the weight values corresponding to the tags of each category;
    根据各个分布比重值与预设的标签分布比重调整函数计算所述各个类别的标签的目标分布比重值;Calculate the target distribution weight value of each category of labels according to each distribution weight value and a preset label distribution weight adjustment function;
    根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容。Target content that meets the first preset condition is sequentially screened from the content set according to the target distribution weight value of the tags of each category and the weight value corresponding to the tags of each category in the contents to be screened.
  16. 根据权利要求15所述的计算机可读存储介质,所述根据各个待筛选内容中的每一类别的标签和每一类别的标签对应的权重值计算所述内容集中包含的各个类别的标签的分布比重值包括:The computer-readable storage medium according to claim 15, wherein the distribution of the tags of each category included in the content set is calculated according to the tags of each category in the respective contents to be screened and the weight values corresponding to the tags of each category Specific gravity values include:
    获取当前类别的标签在各个待筛选内容中的权重值,所述当前类别的标签为所述内容集中包含的所有类别标签中的一类标签;Obtain the weight value of the label of the current category in each content to be screened, where the label of the current category is a category of tags in all category tags included in the content set;
    将获取到的所有权重值的和作为所述当前类别的标签的分布比重值。The sum of all the obtained weight values is used as the distribution weight value of the label of the current category.
  17. 根据权利要求15或16所述的计算机可读存储介质,所述根据各个类别的标签的目标分布比重值、各个待筛选内容中的每一类别的标签对应的权重值依次从所述内容集中筛选出符合第一预设条件的目标内容包括:The computer-readable storage medium according to claim 15 or 16, wherein the content set is filtered in sequence according to the target distribution weight value of each category of tags and the weight value corresponding to each category of tags in each content to be screened The target content that meets the first preset condition includes:
    按照各个待筛选内容在所述内容集中的排序依次对各个待筛选内容进行筛选处理操作,其中,所述筛选处理操作包括:According to the ordering of each content to be screened in the content set, the screening processing operation is performed on each to-be-screened content in sequence, wherein the screening processing operation includes:
    获取当前待筛选内容中的每一类别的标签对应的第一权重值;Obtain the first weight value corresponding to the label of each category in the current content to be screened;
    判断与所述当前待筛选内容中的类别标签相对应的第一目标分布比重值是否大于或者等于所述第一权重值;judging whether the first target distribution weight value corresponding to the category label in the currently to-be-screened content is greater than or equal to the first weight value;
    若是,则将当前待筛选内容作为目标内容,并将所述第一目标分布比重值与所述第一权重值的差值更新所述第一目标分布比重值。If so, the current content to be screened is used as the target content, and the first target distribution weight value is updated with the difference between the first target distribution weight value and the first weight value.
  18. 根据权利要求17所述的计算机可读存储介质,所述计算机可读指令被处理器执行时还实现以下步骤:The computer-readable storage medium of claim 17, the computer-readable instructions further implementing the following steps when executed by the processor:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第二预设条件的目标内容,其中,所述第二预设条件为当前待筛选内容中的至少一类别的标签对应的目标分布比重值不为零。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a second preset condition from the remaining contents to be screened in the content set, where the second preset condition is the current to-be-screened content The target distribution weight value corresponding to the label of at least one category in the content is not zero.
  19. 根据权利要求17所述的计算机可读存储介质,所述计算机可读指令被处理器执行时还实现以下步骤:The computer-readable storage medium of claim 17, the computer-readable instructions further implementing the following steps when executed by the processor:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余待筛选内容中筛选出符合第三预设条件的目标内容,其中,所述第三预设条件为当前待筛选内容具有预设标记。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a third preset condition from the remaining contents to be screened in the content set, where the third preset condition is the current to-be-screened content Content has preset tags.
  20. 根据权利要求17所述的计算机可读存储介质,所述计算机可读指令被处理器执行时还实现以下步骤:The computer-readable storage medium of claim 17, wherein the computer-readable instructions, when executed by the processor, further implement the following steps:
    在筛选得到的目标内容的数量小于预设数量时,从所述内容集的剩余的待筛选内容中筛选出符合第四预设条件的目标内容,其中,所述第四预设条件为当前待筛选内容的评分大于其他待筛选内容的评分。When the number of target contents obtained by screening is less than the preset number, select target contents that meet a fourth preset condition from the remaining contents to be screened in the content set, where the fourth preset condition is the current to-be-screened content The rating of the filtered content is higher than the ratings of other content to be filtered.
PCT/CN2021/103571 2020-09-04 2021-06-30 Content screening method and device WO2022048289A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/024,485 US20230418890A1 (en) 2020-09-04 2021-06-30 Content screening method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010920038.6 2020-09-04
CN202010920038.6A CN112417202B (en) 2020-09-04 2020-09-04 Content screening method and device

Publications (1)

Publication Number Publication Date
WO2022048289A1 true WO2022048289A1 (en) 2022-03-10

Family

ID=74854298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103571 WO2022048289A1 (en) 2020-09-04 2021-06-30 Content screening method and device

Country Status (3)

Country Link
US (1) US20230418890A1 (en)
CN (1) CN112417202B (en)
WO (1) WO2022048289A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996165A (en) * 2022-08-01 2022-09-02 飞狐信息技术(天津)有限公司 Business data auditing method and device, storage medium and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417202B (en) * 2020-09-04 2023-06-30 上海哔哩哔哩科技有限公司 Content screening method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043781A1 (en) * 2007-08-10 2009-02-12 Yahoo! Inc., A Delaware Corporation Method and System for Providing Content According to Personal Preference
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
CN107426328A (en) * 2017-08-08 2017-12-01 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN110659388A (en) * 2019-10-10 2020-01-07 北京奇艺世纪科技有限公司 To-be-recommended information screening method and device, electronic equipment and storage medium
CN112417202A (en) * 2020-09-04 2021-02-26 上海哔哩哔哩科技有限公司 Content screening method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8122031B1 (en) * 2009-06-11 2012-02-21 Google Inc. User label and user category based content classification
US9519685B1 (en) * 2012-08-30 2016-12-13 deviantArt, Inc. Tag selection, clustering, and recommendation for content hosting services
US9535996B1 (en) * 2012-08-30 2017-01-03 deviantArt, Inc. Selecting content objects for recommendation based on content object collections
CN104021163B (en) * 2014-05-28 2017-10-24 深圳市盛讯达科技股份有限公司 Products Show system and method
CN107391680A (en) * 2017-07-24 2017-11-24 北京京东尚科信息技术有限公司 Content recommendation method, device and equipment
CN110019945A (en) * 2017-12-28 2019-07-16 飞狐信息技术(天津)有限公司 Video recommendation method and device, storage medium and electronic equipment
CN108268619B (en) * 2018-01-08 2020-06-30 阿里巴巴集团控股有限公司 Content recommendation method and device
CN109543111B (en) * 2018-11-28 2021-09-21 广州虎牙信息科技有限公司 Recommendation information screening method and device, storage medium and server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090043781A1 (en) * 2007-08-10 2009-02-12 Yahoo! Inc., A Delaware Corporation Method and System for Providing Content According to Personal Preference
CN103164463A (en) * 2011-12-16 2013-06-19 国际商业机器公司 Method and device for recommending labels
CN107426328A (en) * 2017-08-08 2017-12-01 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN110659388A (en) * 2019-10-10 2020-01-07 北京奇艺世纪科技有限公司 To-be-recommended information screening method and device, electronic equipment and storage medium
CN112417202A (en) * 2020-09-04 2021-02-26 上海哔哩哔哩科技有限公司 Content screening method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114996165A (en) * 2022-08-01 2022-09-02 飞狐信息技术(天津)有限公司 Business data auditing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN112417202A (en) 2021-02-26
US20230418890A1 (en) 2023-12-28
CN112417202B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
WO2022048289A1 (en) Content screening method and device
US9087108B2 (en) Determination of category information using multiple stages
CN108427695A (en) Method and application server are recommended by enterprise
WO2012092196A1 (en) Recommendation of search keywords based on indication of user intention
US10748166B2 (en) Method and system for mining churn factor causing user churn for network application
CN109165975B (en) Label recommending method, device, computer equipment and storage medium
CN110992124B (en) House source recommendation method and house source recommendation system
CN111538901A (en) Article recommendation method and device, server and storage medium
CN109241451B (en) Content combination recommendation method and device and readable storage medium
US20170351739A1 (en) Method and apparatus for identifying timeliness-oriented demands, an apparatus and non-volatile computer storage medium
CN107967280B (en) Method and system for recommending songs by tag
CN113536104A (en) Information recommendation method, device, equipment and storage medium
CN109558384A (en) Log classification method, device, electronic equipment and storage medium
CN111241381A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
CN110442623B (en) Big data mining method and device and data mining server
CN111198961A (en) Commodity searching method and device and server
CN111400516B (en) Label determining method, electronic device and storage medium
CN114443943A (en) Information scheduling method, device and equipment and computer readable storage medium
CN112256844A (en) Text classification method and device
CN112243225A (en) Building indoor user identification method, device, equipment and storage medium
CN114281677A (en) Test case management method, device, equipment and medium based on multi-label system
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product
CN110275986B (en) Video recommendation method based on collaborative filtering, server and computer storage medium
CN113849745A (en) Object recommendation method, device, equipment and storage medium
CN112529646A (en) Commodity classification method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21863357

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21863357

Country of ref document: EP

Kind code of ref document: A1