WO2017166512A1 - Video classification model training method and video classification method - Google Patents

Video classification model training method and video classification method Download PDF

Info

Publication number
WO2017166512A1
WO2017166512A1 PCT/CN2016/089246 CN2016089246W WO2017166512A1 WO 2017166512 A1 WO2017166512 A1 WO 2017166512A1 CN 2016089246 W CN2016089246 W CN 2016089246W WO 2017166512 A1 WO2017166512 A1 WO 2017166512A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
category
attribute
classified
text content
Prior art date
Application number
PCT/CN2016/089246
Other languages
French (fr)
Chinese (zh)
Inventor
张立宁
余婧
Original Assignee
乐视控股(北京)有限公司
乐视云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视云计算有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017166512A1 publication Critical patent/WO2017166512A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Definitions

  • the present disclosure belongs to the field of Internet technologies, and in particular, to a training method and a video classification method for a video classification model.
  • the classified storage of video plays an important role in the management of video and the recommendation of interest.
  • some professional category video playing websites for example, an educational platform for playing teaching videos
  • it has its own set of video management system to classify and store videos on the website.
  • the category video playback website due to the limited capacity of the category video playback website, it does not have long-distance transcoding capability.
  • the video service provider such as LeEco Cloud Platform
  • the ID is distributed to the video service provider's CDN platform.
  • it needs to play the video it only needs to obtain the network address of the video from the video service provider's CDN platform to play.
  • the ID is generally composed of a series of meaningless letters and numbers (the ID of each video is unique), for the video service provider, the content tag of the video stored in its cloud platform is only a string of Meaning letters and numbers. Therefore, it is very difficult for video service providers to classify this type of video in their cloud platform.
  • the purpose of the present disclosure is to enable an accurate classification of video stored by a video service provider (eg, LeEco Cloud Platform) in a cluster of cloud platform servers that it is built.
  • a video service provider eg, LeEco Cloud Platform
  • the present disclosure provides a training method for a video classification model, including the following steps:
  • a Bayesian model is established, and a set of attribute words and an existing category label of each video in the domain video set are input to a Bayesian model to train the Bayesian model to obtain a video classification model.
  • the training method of the video classification model after the step of acquiring the text content of each video in the video collection of a certain domain and the existing category label, further includes:
  • a category directory of the video collection of the domain is established according to the existing category label.
  • the training method of the video classification model wherein the input parameter of the video classification model is an attribute word, and the output parameter is: a plurality of category probability values; wherein each category probability value indicates that the attribute word belongs to the The probability of a category in the category catalog.
  • the training method of the video classification model wherein the step of acquiring the text content and the category label of each video in the video collection of a certain domain comprises:
  • the text content and category labels of the current video are extracted from each video play page network.
  • the training method of the video classification model wherein the step of segmenting the text content of each video to obtain a set of attribute words of each video comprises:
  • the first level keyword set is filtered to obtain a set of attribute words.
  • the training method of the video classification model wherein the text content includes a title and/or a content introduction of a current video.
  • the training method of the video classification model wherein the Bayesian model is a naive Bayesian model.
  • a video classification method comprising the following steps:
  • the attribute word in the step of obtaining a category probability value of each attribute word of the video to be classified, includes at least one category probability value.
  • the step of classifying the video to be classified according to the category probability value of each attribute word includes the following steps:
  • the one with the largest value is selected as the optimal category probability value of the attribute word
  • a computer storage medium is further provided, wherein the computer storage medium can store a program, and when the program is executed, each implementation manner of a training method of a video classification model provided by the present invention can be implemented. Part or all of the steps.
  • a computer storage medium may store a program that, when executed, may implement portions of various implementations of a video classification method provided by the present invention or All steps.
  • the present disclosure enables efficient classification of video by high efficiency, simplicity, and high accuracy.
  • 1 is a flow chart showing the steps of a training method of the video classification model of the present disclosure
  • FIG. 2 is a flow chart showing steps of acquiring text content and category tags of a video in a training method of the video classification model of the present disclosure
  • 3 is a flow chart showing the steps of segmenting the text content of each video in the training method of the video classification model of the present disclosure
  • FIG. 5 is a flow chart showing the steps of classifying a video to be classified according to a category probability value of each attribute word in the video classification method of the present disclosure.
  • 1 is a flow chart showing the steps of a training method of the video classification model of the present disclosure.
  • a training method for a video classification model includes the following steps:
  • step S1 the text content and the existing category label of each video in the video collection in a certain domain are obtained.
  • the video playing page on the website includes text content that is edited in natural language and describes the content of the video, the text content including the current video. Title and / or content introduction.
  • a field can be in the fields of education, news, entertainment, and so on.
  • these professional category video playing websites generally establish their own set of category directories, wherein the set of category directories includes multiple category names, and each video is divided into corresponding categories. Under the category name, use the category name as the category label for the video.
  • the existing category label described in the present disclosure refers to the category label of the video in the professional category video playing website.
  • the method further includes: step of establishing a category directory of the video collection of the domain according to the existing category label.
  • the video source of a certain field stored in the video service provider's cloud platform for example, LeEco Cloud Platform
  • the video source is not only a video playing website, but may be derived from a large amount of video. Playing the website, therefore, since the existing category directory of each video playing website may not be comprehensive and cannot cover all the videos in the video collection of a certain domain, the present disclosure needs to re-create the video of the field based on the existing category label.
  • the category directory of the collection is not only a video playing website, but may be derived from a large amount of video. Playing the website, therefore, since the existing category directory of each video playing website may not be comprehensive and cannot cover all the videos in the video collection of a certain domain, the present disclosure needs to re-create the video of the field based on the existing category label.
  • the category directory of the collection is not only a video playing website, but may be derived from
  • the present disclosure takes the field of education as an example, and the category names in the catalogue of the re-established video collections of the educational field mainly include: pre-school, elementary school, junior high school, junior high school, senior high school entrance examination, high school, college entrance examination, university, study abroad, civil servant, and judicial , IT, finance and finance, international study tours, management, life skills, sports, summer camps, interests, arts, language training, pregnancy and baby counseling, vocational skills, and others.
  • step S2 the text content of each video is segmented to obtain a set of attribute words for each video.
  • the text content of each video can be segmented by using a word segmentation algorithm in the prior art to obtain a set of attribute words for each video.
  • the set of attribute words of each video includes at least one attribute word.
  • Step S3 a Bayesian model is established, and the attribute word set of each video in the domain video set and the existing category label are input to the Bayesian model to train the Bayesian model to obtain a video classification model.
  • the Bayesian model is a naive Bayesian model.
  • the input parameter of the video classification model is an attribute word
  • the output parameter is: a plurality of category probability values.
  • each category probability value indicates a probability that the attribute word belongs to a category in the category catalog.
  • FIG. 2 is a flow chart showing the steps of acquiring text content and category tags of a video in a training method of the video classification model of the present disclosure.
  • the step of acquiring the text content and the category label of each video in the video collection of a certain domain includes:
  • Step S11 Obtain a network address of each video in a video collection of a certain domain stored in the cloud server.
  • the professional category video playing website using the long-range transcoding service provided by the cloud platform server cluster Prior to step S1, the professional category video playing website using the long-range transcoding service provided by the cloud platform server cluster generates the long-range transcoding function provided by the video service provider (for example, LeEco Cloud Platform) on the video on the website.
  • the ID of the video is then distributed to one or more servers (ie, cloud servers) in the CDN platform of the video service provider, and the cloud server stores the video.
  • the video service provider usually provides long-distance transcoding services for a large number of video playing websites, the video service provider's cloud server stores a large amount of video, an ID of each video, and a network of each video. address. Therefore, in step S11, only the network address of the video needs to be acquired.
  • Step S12 Obtain a broadcast of each video by using a webpage crawling algorithm according to the network address of the video. Put the page.
  • the web crawling algorithm refers to an algorithm based on the prior art web crawler.
  • the web crawler is a program for automatically extracting web pages, which is a search engine for downloading web pages from the World Wide Web, and is an important component of the search engine.
  • the traditional crawler starts from the URL of one or several initial webpages and obtains the URL on the initial webpage.
  • the new URL is continuously extracted from the current page into the queue until a certain stop condition of the system is satisfied.
  • step S13 the text content and the category label of the current video are extracted from each video playpage network.
  • the video playing page on the website includes text content that is edited in natural language and describes the content of the video, the text content including the current video. Title and / or content introduction.
  • these professional category video playing websites generally establish their own set of category directories, wherein the set of category directories includes multiple category names, and each video is divided into corresponding categories. Under the category name, use the category name as the category label for the video.
  • the existing category label described in the present disclosure refers to the category label of the video in the professional category video playing website.
  • FIG. 3 is a flow chart showing the steps of word segmentation of the text content of each video in the training method of the video classification model of the present disclosure.
  • step S2 the text content of each video is segmented, and the step of obtaining the attribute word set of each video includes:
  • Step S21 performing segmentation on the text content, obtaining a word segmentation result, performing part-of-speech tagging on each word in the word segmentation result according to the part-of-speech tagging algorithm, and screening the words in the segmentation result according to the tagging result to obtain a Level keyword collection.
  • the first level keyword set includes multiple first level keywords.
  • the method further includes storing the participle part-of-speech table in the cloud server, and updating the participle part-of-speech table from time to time.
  • Step S22 filtering the first-level keyword set according to the stop word table to obtain an attribute word set.
  • the attribute word set includes a plurality of attribute words.
  • the method further includes storing the stop word table in the cloud server, and updating the stop word table from time to time.
  • the stop word list uses the stop word list in the prior art. Filtering the first-level keyword set refers to filtering out the stop words in the primary keyword set.
  • FIG. 4 is a flow chart of the steps of the video classification method of the present disclosure.
  • a video classification method includes the following steps:
  • step S01 the text content of the video to be classified is obtained.
  • Obtaining a video to be classified is a new video, which is a new video uploaded to the cloud server.
  • Step S02 Perform word segmentation on the text content of the classified video to obtain a set of attribute words of the video to be classified.
  • Step S03 Enter each attribute word in the attribute word set of the video to be classified into the video classification model according to any one of claims 1-4 to obtain a category probability value of each attribute word of the video to be classified.
  • the attribute word includes at least one category probability value.
  • FIG. 5 is a flow chart showing the steps of classifying a video to be classified according to a category probability value of each attribute word in the video classification method of the present disclosure.
  • the step of classifying the video to be classified according to the category probability value of each attribute word includes the following steps:
  • Step S031 Select one of the plurality of category probability values of each attribute word as the optimal category probability value of the attribute word.
  • Step S032 Perform a product operation on the optimal category probability value of each attribute word in the attribute word set of the classified video to obtain a class probability of the to-be-classified video.
  • Step S033 Determine, according to the category probability of the video to be classified, a category label of the to-be-categorized video in the category directory.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium can be stored There is a program, which can implement some or all of the implementation steps of the training method of the video classification model provided by the embodiment shown in FIG. 1 to FIG.
  • the embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and the program can be implemented in each implementation manner of a video classification method provided by the embodiment shown in FIG. 4-5. Part or all of the steps.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A training method of a video classification model and a video classification method on the basis of the trained video classification model, the training method of the video classification model comprising: obtaining text content and existing category labels for each video in a video set of a certain domain (S1); segmenting words of the text content of each video to obtain a set of attribute words for each video (S2); and establishing a Bayesian model, and inputting the set of attribute words and the existing category labels of each video in the video set of the certain domain into the Bayesian model to train the Bayesian model, so as to obtain a video classification model (S3). The video classification method comprises: segmenting words of the text content of a to-be-classified video, to obtain a set of attribution words of the to-be-classified video (S02); inputting each attribute word in the set of attribute words into a video classification model to determine a category label of the to-be-classified video in the category directory. With this method, it is possible to achieve the classification of video efficiently, easily and with high accuracy.

Description

视频分类模型的训练方法和视频分类方法Training method and video classification method for video classification model
本申请要求于2016年03月31日提交中国专利局、申请号为2016102024974、发明名称为“视频分类模型的训练方法和视频分类方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 2016102024974, entitled "Training Method of Video Classification Model and Video Classification Method", filed on March 31, 2016, the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本公开属于互联网技术领域,尤其涉及一种视频分类模型的训练方法和视频分类方法。The present disclosure belongs to the field of Internet technologies, and in particular, to a training method and a video classification method for a video classification model.
背景技术Background technique
在大数据的环境下,视频的分类存储对于实现视频的管理以及兴趣推荐具有十分重要的作用。现有技术中,对于一些专业性的类别视频播放网站(例如播放教学视频的教育平台),具有自己的一套视频管理系统,对网站上的视频进行分类存储。但由于类别视频播放网站自身能力有限,不具备长程转码能力,当其想上传一个视频时,需要利用视频服务提供商(例如乐视云平台)提供的长程转码功能生成该视频的ID,再将该ID分发至视频服务提供商的CDN平台上,当其需要播放该视频时,只需要从视频服务提供商的CDN平台上取到该视频的网络地址即可实现播放。由于ID一般由一串无意义的字母、数字构成(每个视频的ID是唯一的),因此对于视频服务提供商来讲,存储于其云平台中的该视频的内容标签仅仅为一串无意义的字母、数字。因此,若视频服务提供商想对其云平台中这种类型的视频进行分类十分困难。In the context of big data, the classified storage of video plays an important role in the management of video and the recommendation of interest. In the prior art, for some professional category video playing websites (for example, an educational platform for playing teaching videos), it has its own set of video management system to classify and store videos on the website. However, due to the limited capacity of the category video playback website, it does not have long-distance transcoding capability. When it wants to upload a video, it needs to generate the ID of the video by using the long-distance transcoding function provided by the video service provider (such as LeEco Cloud Platform). The ID is distributed to the video service provider's CDN platform. When it needs to play the video, it only needs to obtain the network address of the video from the video service provider's CDN platform to play. Since the ID is generally composed of a series of meaningless letters and numbers (the ID of each video is unique), for the video service provider, the content tag of the video stored in its cloud platform is only a string of Meaning letters and numbers. Therefore, it is very difficult for video service providers to classify this type of video in their cloud platform.
发明内容Summary of the invention
本公开的目的是为了实现视频服务提供商(例如乐视云平台)对存储于其所搭建的云平台服务器群集中的视频的准确分类。The purpose of the present disclosure is to enable an accurate classification of video stored by a video service provider (eg, LeEco Cloud Platform) in a cluster of cloud platform servers that it is built.
为了实现本公开的目的,本公开提供一种视频分类模型的训练方法,包括以下步骤:In order to achieve the objectives of the present disclosure, the present disclosure provides a training method for a video classification model, including the following steps:
获取某领域视频集合中每个视频的文本内容和已有类别标签; Obtain the text content and existing category labels of each video in a certain area video collection;
对每个视频的文本内容进行分词,得到每个视频的属性词集合;Segmenting the text content of each video to obtain a set of attribute words for each video;
建立贝叶斯模型,将该领域视频集合中每个视频的属性词集合和已有类别标签输入至贝叶斯模型以对所述贝叶斯模型进行训练,得到视频分类模型。A Bayesian model is established, and a set of attribute words and an existing category label of each video in the domain video set are input to a Bayesian model to train the Bayesian model to obtain a video classification model.
进一步,所述的视频分类模型的训练方法,其中,在获取某领域视频集合中每个视频的文本内容和已有类别标签的步骤之后,还包括:Further, the training method of the video classification model, after the step of acquiring the text content of each video in the video collection of a certain domain and the existing category label, further includes:
根据所述已有类别标签,建立该领域视频集合的类别目录。A category directory of the video collection of the domain is established according to the existing category label.
进一步,所述的视频分类模型的训练方法,其中,所述视频分类模型的输入参数为属性词,输出参数为:多个类别概率值;其中,每个类别概率值表示该属性词属于所述类别目录中某个类别的概率。Further, the training method of the video classification model, wherein the input parameter of the video classification model is an attribute word, and the output parameter is: a plurality of category probability values; wherein each category probability value indicates that the attribute word belongs to the The probability of a category in the category catalog.
进一步,所述的视频分类模型的训练方法,其中,所述获取某领域的视频集合中每个视频的文本内容和类别标签的步骤包括:Further, the training method of the video classification model, wherein the step of acquiring the text content and the category label of each video in the video collection of a certain domain comprises:
获取存储于云服务器中的某领域的视频集合中每个视频的网络地址;Obtaining a network address of each video in a video collection of a certain domain stored in the cloud server;
根据所述视频的网络地址,通过网页爬取算法获取每个视频的播放网页;Obtaining a play webpage of each video by using a webpage crawling algorithm according to the network address of the video;
从每个视频播放页网中提取当前视频的文本内容和类别标签。The text content and category labels of the current video are extracted from each video play page network.
进一步,所述的视频分类模型的训练方法,其中,所述对每个视频的文本内容进行分词,得到每个视频的属性词集合的步骤包括:Further, the training method of the video classification model, wherein the step of segmenting the text content of each video to obtain a set of attribute words of each video comprises:
对所述文本内容进行分词,得到分词结果;Segmenting the text content to obtain a word segmentation result;
根据词性标注算法对所述分词结果中的每个词语进行词性标注,并根据标注结果对所述分词结果中的词语进行筛选,得到一级关键词集合;Performing part-of-speech tagging on each word in the segmentation result according to the part-of-speech tagging algorithm, and filtering the words in the segmentation result according to the tagging result to obtain a first-level keyword set;
根据停用词表,对所述一级关键词集合进行过滤,得到属性词集合。According to the stop word table, the first level keyword set is filtered to obtain a set of attribute words.
进一步,所述的视频分类模型的训练方法,其中,所述文本内容包括当前视频的标题和/或内容简介。Further, the training method of the video classification model, wherein the text content includes a title and/or a content introduction of a current video.
进一步,所述的视频分类模型的训练方法,其中,所述,所述贝叶斯模型为朴素贝叶斯模型。Further, the training method of the video classification model, wherein the Bayesian model is a naive Bayesian model.
根据本公开的另一个方面,还提供了一种视频分类方法,包括以下步骤:According to another aspect of the present disclosure, there is also provided a video classification method comprising the following steps:
获取待分类视频的文本内容;Get the text content of the video to be classified;
对待分类视频的文本内容进行分词,得到待分类视频的属性词集合; Performing word segmentation on the text content of the classified video to obtain a set of attribute words of the video to be classified;
将待分类视频的属性词集合中的每个属性词输入权利要求1-4任一项所述的视频分类模型,得到待分类视频的每个属性词的类别概率值;Entering each of the attribute words in the set of attribute words of the video to be classified into the video classification model according to any one of claims 1 to 4, and obtaining a category probability value of each attribute word of the video to be classified;
根据所述每个属性词的类别概率值,确定所述待分类视频在所述类别目录中的类别标签。Determining, according to the category probability value of each attribute word, a category label of the to-be-categorized video in the category directory.
进一步,所述的视频分类方法,所述得到待分类视频的每个属性词的类别概率值的步骤中,所述每个属性词包括至少一个类别概率值。Further, in the video classification method, in the step of obtaining a category probability value of each attribute word of the video to be classified, the attribute word includes at least one category probability value.
进一步,所述的视频分类方法,所述根据所述每个属性词的类别概率值,对待分类视频进行分类的步骤包括以下步骤:Further, in the video classification method, the step of classifying the video to be classified according to the category probability value of each attribute word includes the following steps:
从每个属性词的多个类别概率值中,选取数值最大的一个作为该属性词的最优类别概率值;From the plurality of category probability values of each attribute word, the one with the largest value is selected as the optimal category probability value of the attribute word;
对待分类视频的属性词集合中的各个属性词的最优类别概率值进行乘积运算,得到所述待分类视频的类别概率;Performing a product operation on the optimal class probability value of each attribute word in the attribute word set of the classified video to obtain a class probability of the video to be classified;
根据待分类视频的类别概率,确定所述待分类视频在所述类别目录中的类别标签。Determining, according to a category probability of the video to be classified, a category label of the to-be-categorized video in the category directory.
根据本公开的又一个方面,还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可实现本发明提供的一种视频分类模型的训练方法的各实现方式中的部分或全部步骤。According to still another aspect of the present disclosure, a computer storage medium is further provided, wherein the computer storage medium can store a program, and when the program is executed, each implementation manner of a training method of a video classification model provided by the present invention can be implemented. Part or all of the steps.
根据本公开的又一个方面,还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可实现本发明提供的一种视频分类方法的各实现方式中的部分或全部步骤。According to still another aspect of the present disclosure, there is also provided a computer storage medium, wherein the computer storage medium may store a program that, when executed, may implement portions of various implementations of a video classification method provided by the present invention or All steps.
本公开能够高效、简便和高准确率的实现对视频的分类。The present disclosure enables efficient classification of video by high efficiency, simplicity, and high accuracy.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。The above general description and the following detailed description are intended to be illustrative and not restrictive.
附图说明DRAWINGS
图1是本公开视频分类模型的训练方法的步骤流程图;1 is a flow chart showing the steps of a training method of the video classification model of the present disclosure;
图2是本公开视频分类模型的训练方法中获取视频的文本内容和类别标签的步骤流程图; 2 is a flow chart showing steps of acquiring text content and category tags of a video in a training method of the video classification model of the present disclosure;
图3是本公开视频分类模型的训练方法中对每个视频的文本内容进行分词的步骤流程图;3 is a flow chart showing the steps of segmenting the text content of each video in the training method of the video classification model of the present disclosure;
图4是本公开视频分类方法的步骤流程图;4 is a flow chart showing the steps of the video classification method of the present disclosure;
图5是本公开视频分类方法中根据每个属性词的类别概率值,对待分类视频进行分类的步骤流程图。FIG. 5 is a flow chart showing the steps of classifying a video to be classified according to a category probability value of each attribute word in the video classification method of the present disclosure.
具体实施方式detailed description
为使本公开的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本公开进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本公开的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。The present disclosure will be further described in detail below with reference to the specific embodiments thereof and the accompanying drawings. It is to be understood that the description is not intended to limit the scope of the disclosure. In addition, descriptions of well-known structures and techniques are omitted in the following description in order to avoid unnecessarily obscuring the concept of the present disclosure.
图1是本公开视频分类模型的训练方法的步骤流程图。1 is a flow chart showing the steps of a training method of the video classification model of the present disclosure.
如图1所示,一种视频分类模型的训练方法,包括以下步骤:As shown in FIG. 1, a training method for a video classification model includes the following steps:
步骤S1,获取某领域视频集合中每个视频的文本内容和已有类别标签。In step S1, the text content and the existing category label of each video in the video collection in a certain domain are obtained.
在一些专业性的类别视频播放网站(例如播放教学视频的教育平台),网站上的视频播放页中包括以自然语言编辑的对视频的内容进行描述的文本内容,所述文本内容包括当前视频的标题和/或内容简介。某领域可以是教育、新闻、娱乐等领域。另外,这些专业性的类别视频播放网站为了便于对视频的管理,一般都会建立自己的一套类别目录,其中,所述一套类别目录包括多个类别名称,每个视频都会被划分到相应的类别名称下,并将该类别名称作为该视频的类别标签。本公开中所述的已有类别标签指的视频在专业性的类别视频播放网站中的类别标签。In some professional category video playing websites (such as an educational platform for playing instructional videos), the video playing page on the website includes text content that is edited in natural language and describes the content of the video, the text content including the current video. Title and / or content introduction. A field can be in the fields of education, news, entertainment, and so on. In addition, in order to facilitate the management of video, these professional category video playing websites generally establish their own set of category directories, wherein the set of category directories includes multiple category names, and each video is divided into corresponding categories. Under the category name, use the category name as the category label for the video. The existing category label described in the present disclosure refers to the category label of the video in the professional category video playing website.
在获取某领域视频集合中每个视频的文本内容和已有类别标签的步骤之后,还包括:根据所述已有类别标签,建立该领域视频集合的类别目录的步骤。需要说明的是,由于视频服务提供商的云平台(例如乐视云平台)中所存储的某一领域的视频集合中,其中的视频来源不仅仅于一个视频播放网站,可能是来源于海量的视频播放网站,因此,由于每个视频播放网站的已有的类别目录可能不全面,不能覆盖到某领域视频集合中的所有的视频,因此,本公开需要基于已有类别标签,重新建立该领域视频集合的类别目录。 After the step of acquiring the text content of each video in the video collection of the domain and the existing category label, the method further includes: step of establishing a category directory of the video collection of the domain according to the existing category label. It should be noted that, because the video source of a certain field stored in the video service provider's cloud platform (for example, LeEco Cloud Platform), the video source is not only a video playing website, but may be derived from a large amount of video. Playing the website, therefore, since the existing category directory of each video playing website may not be comprehensive and cannot cover all the videos in the video collection of a certain domain, the present disclosure needs to re-create the video of the field based on the existing category label. The category directory of the collection.
具体的,本公开以教育领域为例,重新建立的教育领域视频集合的类别目录中的类别名称主要包括:学前、小学、小升初、初中、中考、高中、高考、大学、出国留学、公务员、司法、IT、财经金融、国际游学、管理、生活技能、体育、夏令营、兴趣、艺术、语言培训、孕婴辅导、职业技能、其他。Specifically, the present disclosure takes the field of education as an example, and the category names in the catalogue of the re-established video collections of the educational field mainly include: pre-school, elementary school, junior high school, junior high school, senior high school entrance examination, high school, college entrance examination, university, study abroad, civil servant, and judicial , IT, finance and finance, international study tours, management, life skills, sports, summer camps, interests, arts, language training, pregnancy and baby counseling, vocational skills, and others.
步骤S2,对每个视频的文本内容进行分词,得到每个视频的属性词集合。In step S2, the text content of each video is segmented to obtain a set of attribute words for each video.
本步骤中,可以采用现有技术中的分词算法对每个视频的文本内容进行分词,得到每个视频的属性词集合。其中,每个视频的属性词集合包括至少一个属性词。In this step, the text content of each video can be segmented by using a word segmentation algorithm in the prior art to obtain a set of attribute words for each video. Wherein, the set of attribute words of each video includes at least one attribute word.
步骤S3,建立贝叶斯模型,将该领域视频集合中每个视频的属性词集合和已有类别标签输入至贝叶斯模型以对所述贝叶斯模型进行训练,得到视频分类模型。Step S3, a Bayesian model is established, and the attribute word set of each video in the domain video set and the existing category label are input to the Bayesian model to train the Bayesian model to obtain a video classification model.
所述贝叶斯模型为朴素贝叶斯模型。所述视频分类模型的输入参数为属性词,输出参数为:多个类别概率值。其中,每个类别概率值表示该属性词属于所述类别目录中某个类别的概率。The Bayesian model is a naive Bayesian model. The input parameter of the video classification model is an attribute word, and the output parameter is: a plurality of category probability values. Wherein, each category probability value indicates a probability that the attribute word belongs to a category in the category catalog.
图2是本公开视频分类模型的训练方法中获取视频的文本内容和类别标签的步骤流程图。2 is a flow chart showing the steps of acquiring text content and category tags of a video in a training method of the video classification model of the present disclosure.
如图2所示,所述获取某领域的视频集合中每个视频的文本内容和类别标签的步骤包括:As shown in FIG. 2, the step of acquiring the text content and the category label of each video in the video collection of a certain domain includes:
步骤S11,获取存储于云服务器中的某领域的视频集合中每个视频的网络地址。Step S11: Obtain a network address of each video in a video collection of a certain domain stored in the cloud server.
在步骤S1之前,使用云平台服务器群集所提供的长程转码服务的专业性的类别视频播放网站,将其网站上的视频利用视频服务提供商(例如乐视云平台)提供的长程转码功能生成该视频的ID,再将该ID分发至视频服务提供商的CDN平台中的一台或多台服务器(即云服务器),所述云服务器对所述视频进行存储。需要说明的是,由于视频服务提供商通常为大量的视频播放网站提供长程转码的服务,因此视频服务提供商的云服务器中存储有海量的视频、每个视频的ID以及每个视频的网络地址。因此,在步骤S11中,仅仅需要将所述视频的网络地址获取到即可。Prior to step S1, the professional category video playing website using the long-range transcoding service provided by the cloud platform server cluster generates the long-range transcoding function provided by the video service provider (for example, LeEco Cloud Platform) on the video on the website. The ID of the video is then distributed to one or more servers (ie, cloud servers) in the CDN platform of the video service provider, and the cloud server stores the video. It should be noted that since the video service provider usually provides long-distance transcoding services for a large number of video playing websites, the video service provider's cloud server stores a large amount of video, an ID of each video, and a network of each video. address. Therefore, in step S11, only the network address of the video needs to be acquired.
步骤S12,根据所述视频的网络地址,通过网页爬取算法获取每个视频的播 放网页。Step S12: Obtain a broadcast of each video by using a webpage crawling algorithm according to the network address of the video. Put the page.
所是述网页爬取算法,指的基于现有技术中的网络爬虫的算法,网络爬虫是一个自动提取网页的程序,它为搜索引擎从万维网上下载网页,是搜索引擎的重要组成。传统爬虫从一个或若干初始网页的URL开始,获得初始网页上的URL,在抓取网页的过程中,不断从当前页面上抽取新的URL放入队列,直到满足系统的一定停止条件。The web crawling algorithm refers to an algorithm based on the prior art web crawler. The web crawler is a program for automatically extracting web pages, which is a search engine for downloading web pages from the World Wide Web, and is an important component of the search engine. The traditional crawler starts from the URL of one or several initial webpages and obtains the URL on the initial webpage. During the process of crawling the webpage, the new URL is continuously extracted from the current page into the queue until a certain stop condition of the system is satisfied.
步骤S13,从每个视频播放页网中提取当前视频的文本内容和类别标签。In step S13, the text content and the category label of the current video are extracted from each video playpage network.
在一些专业性的类别视频播放网站(例如播放教学视频的教育平台),网站上的视频播放页中包括以自然语言编辑的对视频的内容进行描述的文本内容,所述文本内容包括当前视频的标题和/或内容简介。另外,这些专业性的类别视频播放网站为了便于对视频的管理,一般都会建立自己的一套类别目录,其中,所述一套类别目录包括多个类别名称,每个视频都会被划分到相应的类别名称下,并将该类别名称作为该视频的类别标签。本公开中所述的已有类别标签指的视频在专业性的类别视频播放网站中的类别标签。In some professional category video playing websites (such as an educational platform for playing instructional videos), the video playing page on the website includes text content that is edited in natural language and describes the content of the video, the text content including the current video. Title and / or content introduction. In addition, in order to facilitate the management of video, these professional category video playing websites generally establish their own set of category directories, wherein the set of category directories includes multiple category names, and each video is divided into corresponding categories. Under the category name, use the category name as the category label for the video. The existing category label described in the present disclosure refers to the category label of the video in the professional category video playing website.
图3是本公开视频分类模型的训练方法中对每个视频的文本内容进行分词的步骤流程图。3 is a flow chart showing the steps of word segmentation of the text content of each video in the training method of the video classification model of the present disclosure.
如图3所示,所述步骤S2,对每个视频的文本内容进行分词,得到每个视频的属性词集合的步骤包括:As shown in FIG. 3, in step S2, the text content of each video is segmented, and the step of obtaining the attribute word set of each video includes:
步骤S21,对所述文本内容进行分词,得到分词结果,根据词性标注算法对所述分词结果中的每个词语进行词性标注,并根据标注结果对所述分词结果中的词语进行筛选,得到一级关键词集合。其中,所述一级关键词集合中包含多个一级关键词。Step S21, performing segmentation on the text content, obtaining a word segmentation result, performing part-of-speech tagging on each word in the word segmentation result according to the part-of-speech tagging algorithm, and screening the words in the segmentation result according to the tagging result to obtain a Level keyword collection. The first level keyword set includes multiple first level keywords.
由于文本内容是以自然语言进行描述的,包括很多词语,其中有些词语可能是不需要的一些词语,需要对文本内容采用预定的算法进行关键字提取,以过滤掉一些不需要的词语。在本步骤中,仅仅是根据分词词性表中词语的词性对所述文本内容进行分词,一方面将词语分割,另一方面过滤掉一些结构词、语气词等词语,如的、呢、啊。另外,在本步骤之前,还包括将分词词性表存储于云服务器中,并时时更新所述分词词性表。Since the text content is described in natural language, including many words, some of which may be unnecessary words, a predetermined algorithm is needed for the text content to extract keywords to filter out some unnecessary words. In this step, only the part of the text is segmented according to the part of speech of the word segmentation part of the word segmentation. On the one hand, the words are segmented, on the other hand, some structural words, modal particles and other words are filtered out, such as, ah, ah. In addition, before the step, the method further includes storing the participle part-of-speech table in the cloud server, and updating the participle part-of-speech table from time to time.
步骤S22,根据停用词表,对所述一级关键词集合进行过滤,得到属性词集 合。其中,所述属性词集合中包含多个属性词。Step S22, filtering the first-level keyword set according to the stop word table to obtain an attribute word set. Hehe. The attribute word set includes a plurality of attribute words.
在本步骤之前,还包括将停用词表存储于云服务器中,并时时更新所述停用词表。其中,停用词表采用现有技术中的停用词表。对所述一级关键词集合进行过滤指的是将一级关键词集合中的停用词过滤掉。Before this step, the method further includes storing the stop word table in the cloud server, and updating the stop word table from time to time. Among them, the stop word list uses the stop word list in the prior art. Filtering the first-level keyword set refers to filtering out the stop words in the primary keyword set.
图4是本公开视频分类方法的步骤流程图。4 is a flow chart of the steps of the video classification method of the present disclosure.
如图4所示,一种视频分类方法,包括以下步骤:As shown in FIG. 4, a video classification method includes the following steps:
步骤S01,获取待分类视频的文本内容。In step S01, the text content of the video to be classified is obtained.
获取待分类视频即一个新的视频,该视频是新上传至云服务器中的视频。Obtaining a video to be classified is a new video, which is a new video uploaded to the cloud server.
步骤S02,对待分类视频的文本内容进行分词,得到待分类视频的属性词集合。Step S02: Perform word segmentation on the text content of the classified video to obtain a set of attribute words of the video to be classified.
步骤S03,将待分类视频的属性词集合中的每个属性词输入权利要求1-4任一项所述的视频分类模型,得到待分类视频的每个属性词的类别概率值。Step S03: Enter each attribute word in the attribute word set of the video to be classified into the video classification model according to any one of claims 1-4 to obtain a category probability value of each attribute word of the video to be classified.
根据所述每个属性词的类别概率值,确定所述待分类视频在所述类别目录中的类别标签。所述得到待分类视频的每个属性词的类别概率值的步骤中,所述每个属性词包括至少一个类别概率值。Determining, according to the category probability value of each attribute word, a category label of the to-be-categorized video in the category directory. In the step of obtaining a category probability value of each attribute word of the video to be classified, the attribute word includes at least one category probability value.
图5是本公开视频分类方法中根据每个属性词的类别概率值,对待分类视频进行分类的步骤流程图。FIG. 5 is a flow chart showing the steps of classifying a video to be classified according to a category probability value of each attribute word in the video classification method of the present disclosure.
如图5所示,所述根据所述每个属性词的类别概率值,对待分类视频进行分类的步骤包括以下步骤:As shown in FIG. 5, the step of classifying the video to be classified according to the category probability value of each attribute word includes the following steps:
步骤S031,从每个属性词的多个类别概率值中,选取数值最大的一个作为该属性词的最优类别概率值。Step S031: Select one of the plurality of category probability values of each attribute word as the optimal category probability value of the attribute word.
步骤S032,对待分类视频的属性词集合中的各个属性词的最优类别概率值进行乘积运算,得到所述待分类视频的类别概率。Step S032: Perform a product operation on the optimal category probability value of each attribute word in the attribute word set of the classified video to obtain a class probability of the to-be-classified video.
步骤S033,根据待分类视频的类别概率,确定所述待分类视频在所述类别目录中的类别标签。Step S033: Determine, according to the category probability of the video to be classified, a category label of the to-be-categorized video in the category directory.
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储 有程序,该程序执行时可实现图1-图3所示实施例提供的一种视频分类模型的训练方法的各实现方式中的部分或全部步骤。The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium can be stored There is a program, which can implement some or all of the implementation steps of the training method of the video classification model provided by the embodiment shown in FIG. 1 to FIG.
本发明实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时可实现图4-图5所示实施例提供的一种视频分类方法的各实现方式中的部分或全部步骤。The embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium can store a program, and the program can be implemented in each implementation manner of a video classification method provided by the embodiment shown in FIG. 4-5. Part or all of the steps.
应当理解的是,本公开的上述具体实施方式仅仅用于示例性说明或解释本公开的原理,而不构成对本公开的限制。因此,在不偏离本公开的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。此外,本公开所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。 The above-described embodiments of the present disclosure are to be construed as merely illustrative or illustrative of the invention. Therefore, any modifications, equivalent substitutions, improvements, etc., which are made without departing from the spirit and scope of the disclosure, are intended to be included within the scope of the disclosure. Rather, the scope of the appended claims is intended to cover all such modifications and

Claims (10)

  1. 一种视频分类模型的训练方法,包括:A training method for a video classification model, comprising:
    获取某领域视频集合中每个视频的文本内容和已有类别标签;Obtain the text content and existing category labels of each video in a certain area video collection;
    对每个视频的文本内容进行分词,得到每个视频的属性词集合;Segmenting the text content of each video to obtain a set of attribute words for each video;
    建立贝叶斯模型,将该领域视频集合中每个视频的属性词集合和已有类别标签输入至贝叶斯模型以对所述贝叶斯模型进行训练,得到视频分类模型。A Bayesian model is established, and a set of attribute words and an existing category label of each video in the domain video set are input to a Bayesian model to train the Bayesian model to obtain a video classification model.
  2. 根据权利要求1所述的方法,其中,在获取某领域视频集合中每个视频的文本内容和已有类别标签之后,还包括:The method according to claim 1, wherein after acquiring the text content of each video in the video collection of the domain and the existing category label, the method further comprises:
    根据所述已有类别标签,建立该领域视频集合的类别目录。A category directory of the video collection of the domain is established according to the existing category label.
  3. 根据权利要求2所述的方法,其中,所述视频分类模型的输入参数为属性词,输出参数为:多个类别概率值;其中,每个类别概率值表示该属性词属于所述类别目录中某个类别的概率。The method according to claim 2, wherein the input parameter of the video classification model is an attribute word, and the output parameter is: a plurality of category probability values; wherein each category probability value indicates that the attribute word belongs to the category directory The probability of a category.
  4. 根据权利要求1-3任一项所述的方法,其中,所述获取某领域的视频集合中每个视频的文本内容和类别标签包括:The method according to any one of claims 1-3, wherein the obtaining the text content and the category label of each video in the video collection of a certain domain comprises:
    获取存储于云服务器中的某领域的视频集合中每个视频的网络地址;Obtaining a network address of each video in a video collection of a certain domain stored in the cloud server;
    根据所述视频的网络地址,获取每个视频的播放网页;Obtaining a play webpage of each video according to the network address of the video;
    从每个视频播放页网中提取当前视频的文本内容和类别标签。The text content and category labels of the current video are extracted from each video play page network.
  5. 根据权利要求1-3任一项所述的方法,其中,所述对每个视频的文本内容进行分词,得到每个视频的属性词集合包括:The method according to any one of claims 1-3, wherein the segmentation of the text content of each video, the set of attribute words for each video comprises:
    对所述文本内容进行分词,得到分词结果;Segmenting the text content to obtain a word segmentation result;
    根据词性标注算法对所述分词结果中的每个词语进行词性标注,并根据标注结果对所述分词结果中的词语进行筛选,得到一级关键词集合;Performing part-of-speech tagging on each word in the segmentation result according to the part-of-speech tagging algorithm, and filtering the words in the segmentation result according to the tagging result to obtain a first-level keyword set;
    根据停用词表,对所述一级关键词集合进行过滤,得到属性词集合。According to the stop word table, the first level keyword set is filtered to obtain a set of attribute words.
  6. 根据权利要求1-3任一项所述的方法,其中,所述文本内容包括当前视频的标题和/或内容简介。A method according to any of claims 1-3, wherein the textual content comprises a title and/or a content profile of the current video.
  7. 根据权利要求1-3任一项所述的方法,其中,所述,所述贝叶斯模 型为朴素贝叶斯模型。The method according to any one of claims 1 to 3, wherein said Bayesian mode The type is a naive Bayesian model.
  8. 一种视频分类方法,包括:A video classification method, including:
    获取待分类视频的文本内容;Get the text content of the video to be classified;
    对待分类视频的文本内容进行分词,得到待分类视频的属性词集合;Performing word segmentation on the text content of the classified video to obtain a set of attribute words of the video to be classified;
    将待分类视频的属性词集合中的每个属性词输入权利要求1-4任一项所述的视频分类模型,得到待分类视频的每个属性词的类别概率值;Entering each of the attribute words in the set of attribute words of the video to be classified into the video classification model according to any one of claims 1 to 4, and obtaining a category probability value of each attribute word of the video to be classified;
    根据所述每个属性词的类别概率值,确定所述待分类视频在所述类别目录中的类别标签。Determining, according to the category probability value of each attribute word, a category label of the to-be-categorized video in the category directory.
  9. 根据权利要求8所述的视频分类方法,其中,所述每个属性词包括至少一个类别概率值。The video classification method of claim 8, wherein each of the attribute words includes at least one category probability value.
  10. 根据权利要求9所述的视频分类方法,其中,所述根据所述每个属性词的类别概率值,对待分类视频进行分类包括:The video classification method according to claim 9, wherein the classifying the video to be classified according to the category probability value of each attribute word comprises:
    从每个属性词的多个类别概率值中,选取数值最大的一个作为该属性词的最优类别概率值;From the plurality of category probability values of each attribute word, the one with the largest value is selected as the optimal category probability value of the attribute word;
    对待分类视频的属性词集合中的各个属性词的最优类别概率值进行乘积运算,得到所述待分类视频的类别概率;Performing a product operation on the optimal class probability value of each attribute word in the attribute word set of the classified video to obtain a class probability of the video to be classified;
    根据待分类视频的类别概率,确定所述待分类视频在所述类别目录中的类别标签。 Determining, according to a category probability of the video to be classified, a category label of the to-be-categorized video in the category directory.
PCT/CN2016/089246 2016-03-31 2016-07-07 Video classification model training method and video classification method WO2017166512A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610202497.4 2016-03-31
CN201610202497.4A CN105913072A (en) 2016-03-31 2016-03-31 Training method of video classification model and video classification method

Publications (1)

Publication Number Publication Date
WO2017166512A1 true WO2017166512A1 (en) 2017-10-05

Family

ID=56745144

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/089246 WO2017166512A1 (en) 2016-03-31 2016-07-07 Video classification model training method and video classification method

Country Status (2)

Country Link
CN (1) CN105913072A (en)
WO (1) WO2017166512A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488489A (en) * 2020-03-26 2020-08-04 腾讯科技(深圳)有限公司 Video file classification method, device, medium and electronic equipment
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111950360A (en) * 2020-07-06 2020-11-17 北京奇艺世纪科技有限公司 Method and device for identifying infringing user
CN112270192A (en) * 2020-11-23 2021-01-26 科大国创云网科技有限公司 Semantic recognition method and system based on filtering of part of speech and stop words
CN112749299A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Method and device for determining video type, electronic equipment and readable storage medium
CN113536778A (en) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 Title generation method and device and computer readable storage medium

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194419A (en) * 2017-05-10 2017-09-22 百度在线网络技术(北京)有限公司 Video classification methods and device, computer equipment and computer-readable recording medium
CN108932252A (en) * 2017-05-25 2018-12-04 合网络技术(北京)有限公司 Video aggregation method and device
CN108959323B (en) * 2017-05-25 2021-12-07 腾讯科技(深圳)有限公司 Video classification method and device
CN108563722B (en) * 2018-04-03 2021-04-02 有米科技股份有限公司 Industry classification method, system, computer device and storage medium for text information
CN108804598A (en) * 2018-05-29 2018-11-13 王妃 Cloud atlas distributed video sorting technique
CN108960316B (en) * 2018-06-27 2020-10-30 北京字节跳动网络技术有限公司 Method and apparatus for generating a model
CN108965920A (en) * 2018-08-08 2018-12-07 北京未来媒体科技股份有限公司 A kind of video content demolition method and device
CN111104545A (en) * 2018-10-26 2020-05-05 阿里巴巴集团控股有限公司 Background music configuration method and equipment, client device and electronic equipment
CN111131899A (en) * 2018-10-31 2020-05-08 中国移动通信集团浙江有限公司 Multi-site video playing record integration method and device
CN110110143B (en) * 2019-04-15 2021-08-03 厦门网宿有限公司 Video classification method and device
CN110851607A (en) * 2019-11-19 2020-02-28 中国银行股份有限公司 Training method and device for information classification model
CN111935499B (en) * 2020-08-17 2021-04-20 深圳市前海多晟科技股份有限公司 Ultrahigh-definition video gateway system based on distributed storage technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173916A1 (en) * 2004-12-22 2006-08-03 Verbeck Sibley Timothy J R Method and system for automatically generating a personalized sequence of rich media
CN102184262A (en) * 2011-06-15 2011-09-14 悠易互通(北京)广告有限公司 Web-based text classification mining system and web-based text classification mining method
CN103955703A (en) * 2014-04-25 2014-07-30 杭州电子科技大学 Medical image disease classification method based on naive Bayes
CN104219575A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Related video recommending method and system
CN104834640A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Webpage identification method and apparatus

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201835A (en) * 2007-12-21 2008-06-18 四川大学 Emergency ganged warning-information automatic sorting system
CN104199933B (en) * 2014-09-04 2017-07-07 华中科技大学 The football video event detection and semanteme marking method of a kind of multimodal information fusion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173916A1 (en) * 2004-12-22 2006-08-03 Verbeck Sibley Timothy J R Method and system for automatically generating a personalized sequence of rich media
CN102184262A (en) * 2011-06-15 2011-09-14 悠易互通(北京)广告有限公司 Web-based text classification mining system and web-based text classification mining method
CN104219575A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Related video recommending method and system
CN104834640A (en) * 2014-02-10 2015-08-12 腾讯科技(深圳)有限公司 Webpage identification method and apparatus
CN103955703A (en) * 2014-04-25 2014-07-30 杭州电子科技大学 Medical image disease classification method based on naive Bayes

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749299A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Method and device for determining video type, electronic equipment and readable storage medium
CN111488489A (en) * 2020-03-26 2020-08-04 腾讯科技(深圳)有限公司 Video file classification method, device, medium and electronic equipment
CN111488489B (en) * 2020-03-26 2023-10-24 腾讯科技(深圳)有限公司 Video file classification method, device, medium and electronic equipment
CN113536778A (en) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 Title generation method and device and computer readable storage medium
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111753790B (en) * 2020-07-01 2023-12-12 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111950360A (en) * 2020-07-06 2020-11-17 北京奇艺世纪科技有限公司 Method and device for identifying infringing user
CN111950360B (en) * 2020-07-06 2023-08-18 北京奇艺世纪科技有限公司 Method and device for identifying infringement user
CN112270192A (en) * 2020-11-23 2021-01-26 科大国创云网科技有限公司 Semantic recognition method and system based on filtering of part of speech and stop words
CN112270192B (en) * 2020-11-23 2023-12-19 科大国创云网科技有限公司 Semantic recognition method and system based on part of speech and deactivated word filtering

Also Published As

Publication number Publication date
CN105913072A (en) 2016-08-31

Similar Documents

Publication Publication Date Title
WO2017166512A1 (en) Video classification model training method and video classification method
CN108694223B (en) User portrait database construction method and device
AU2011326430B2 (en) Learning tags for video annotation using latent subtags
Szomszor et al. Semantic modelling of user interests based on cross-folksonomy analysis
CN112015949A (en) Video generation method and device, storage medium and electronic equipment
CN105893571A (en) Method and system for establishing content tag of video
CN112052414A (en) Data processing method and device and readable storage medium
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN106294358A (en) The search method of a kind of information and system
Alshehri et al. Think before your click: Data and models for adult content in arabic twitter
CN103514289A (en) Method and device for building interest entity base
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
JP2014153977A (en) Content analysis device, content analysis method, content analysis program, and content reproduction system
KR101780237B1 (en) Method and device for answering user question based on q&a data provided on online
WO2015044934A1 (en) A method for adaptively classifying sentiment of document snippets
CN106777140B (en) Method and device for searching unstructured document
CN109241438B (en) Element-based cross-channel hot event discovery method and device and storage medium
Gali et al. Extracting representative image from web page
Deng et al. Using social media for collaborative species identification and occurrence: Issues, methods, and tools
JP4544047B2 (en) Web image search result classification presentation method and apparatus, program, and storage medium storing program
CN114706948A (en) News processing method and device, storage medium and electronic equipment
US20210342393A1 (en) Artificial intelligence for content discovery
JP6530002B2 (en) CONTENT SEARCH DEVICE, CONTENT SEARCH METHOD, PROGRAM
JP6632564B2 (en) Illegal content search device, illegal content search method, and program
CN113742496B (en) Electric power knowledge learning system and method based on heterogeneous resource fusion

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16896284

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16896284

Country of ref document: EP

Kind code of ref document: A1