CN110879974B - Video classification method and device - Google Patents

Video classification method and device Download PDF

Info

Publication number
CN110879974B
CN110879974B CN201911058829.6A CN201911058829A CN110879974B CN 110879974 B CN110879974 B CN 110879974B CN 201911058829 A CN201911058829 A CN 201911058829A CN 110879974 B CN110879974 B CN 110879974B
Authority
CN
China
Prior art keywords
video
classified
vector
classification
visual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911058829.6A
Other languages
Chinese (zh)
Other versions
CN110879974A (en
Inventor
邓积杰
何楠
林星
白兴安
徐扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Weiboyi Technology Co ltd
Original Assignee
Beijing Weiboyi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Weiboyi Technology Co ltd filed Critical Beijing Weiboyi Technology Co ltd
Priority to CN201911058829.6A priority Critical patent/CN110879974B/en
Publication of CN110879974A publication Critical patent/CN110879974A/en
Application granted granted Critical
Publication of CN110879974B publication Critical patent/CN110879974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Abstract

The invention discloses a video classification method and device, and relates to the field of data processing. The invention aims to solve the problem that the existing video classification process is low in efficiency and accuracy. The technical scheme provided by the embodiment of the invention comprises the following steps: acquiring a feature vector of each key frame in the video to be classified according to the key frames in the video to be classified; acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified; acquiring a text classification vector of the video to be classified according to texts contained in image frames in the video to be classified; and substituting the visual classification vector and the text classification vector into a preset classification model to obtain the category of the video to be classified. The scheme can be applied to the fields of video directional pushing and the like.

Description

Video classification method and device
Technical Field
The present invention relates to the field of data processing, and in particular, to a video classification method and apparatus.
Background
In recent years, with the rapid development of internet short video platforms, various videos such as movies, food, science and technology, tourism, education, games and the like show explosive growth. These videos are widely available, low cost, large in daily number, and extremely fast in propagation speed, which brings great challenges to video classification.
In the prior art, videos are generally classified manually or by extracting keywords from titles. However, a large amount of manpower and material resources are consumed by adopting a manual mode, so that the efficiency is low; moreover, the title may not accurately summarize the content of the video, so that the accuracy of video classification by extracting keywords is low; through a pure visual classification method, the categories of constellation fate, job site management, emotion and the like which need semantic understanding cannot be classified, so that the video classification accuracy is low.
Disclosure of Invention
In view of the above, the main objective of the present invention is to solve the problem of low efficiency and accuracy of the existing video classification method.
In one aspect, a video classification method provided in an embodiment of the present invention includes: acquiring a feature vector of each key frame in the video to be classified according to the key frames in the video to be classified; acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified; acquiring a text classification vector of the video to be classified according to texts contained in image frames in the video to be classified; and substituting the visual classification vector and the text classification vector into a preset classification model to obtain the category of the video to be classified.
On the other hand, an embodiment of the present invention provides a video classification apparatus, including:
the characteristic acquisition module is used for acquiring a characteristic vector of each key frame in the video to be classified according to the key frames in the video to be classified;
the visual classification module is connected with the feature acquisition module and used for acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified;
the text classification module is used for acquiring a text classification vector of the video to be classified according to texts contained in image frames in the video to be classified;
and the category acquisition module is respectively connected with the visual classification module and the text classification module and used for substituting the visual classification vector and the text classification vector into a preset classification model to acquire the category of the video to be classified.
In summary, the video classification method and apparatus provided by the present invention achieve video classification by respectively obtaining the visual classification vector and the text classification vector, and substituting the visual classification vector and the text classification vector into the classification model. According to the technical scheme provided by the embodiment of the invention, the visual classification vector and the text classification vector are used as the parameters of video classification together, so that the accuracy of video classification is improved, and the problems of low efficiency and accuracy of the conventional video classification method are solved. In addition, the visual classification vector is obtained according to the feature vector of the key frame, and the text classification vector contains deeper semantic information, so that the accuracy of video classification can be further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a video classification method according to embodiment 1 of the present invention;
fig. 2 is a flowchart of a video classification method according to embodiment 2 of the present invention;
fig. 3 is a first schematic structural diagram of a video classification apparatus according to embodiment 3 of the present invention;
FIG. 4 is a schematic diagram of a visual classification module of the video classification apparatus shown in FIG. 3;
fig. 5 is a schematic structural diagram of a video classification apparatus according to embodiment 3 of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1, the present invention provides a video classification method, including:
step 101, obtaining a feature vector of each key frame in a video to be classified according to the key frame in the video to be classified.
In this embodiment, the key frame in step 101 is also called an I-frame (Intra-coded frame), which is a frame that completely retains image data in the compressed video, and when decoding the key frame, only the image data of the key frame is needed to complete decoding. Because the similarity among all key frames in the video to be classified is small, the video to be classified can be comprehensively represented by a plurality of key frames; by extracting the feature vectors of the key frames, the accuracy rate of classifying the video images to be classified can be improved.
Specifically, the process of obtaining the feature vector through step 101 includes: extracting key frames from the video to be classified according to a preset rule; the preset rules include: one of duration, interval, weight and click rate; and acquiring the feature vector of each key frame in the video to be classified. The specific way to obtain the feature vector may be: respectively extracting the features of each key frame in the video to be classified by using a preset image classifier to obtain the feature vector of each key frame in the video to be classified; or acquiring the feature point of each key frame in the video to be classified, and acquiring the feature vector of each key frame in the video to be classified according to the feature point. The method for determining the feature points and the feature vectors may include: scale-invariant feature transform (SIFT) method, Speeded Up Robust Features (SURF) method, orb (organized FAST and organized brief) method, neural Network method, ResNet method (organized Residual Network), Xception (depth separable convolution), I3D (migrated 3D conv, Inflated 3D convolution method), P3D (Pseudo-3D Residual Networks), and TSN (temporal segment Networks).
And 102, acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified.
In this embodiment, the process of obtaining the visual classification vector through step 102 includes: combining the feature vectors of all key frames in the video to be classified according to rows to obtain a feature map; and respectively fusing each line of data in the characteristic diagram into one data to obtain a visual classification vector of the video to be classified. The process of respectively fusing each line of data in the feature map into one data may include: respectively calculating the average value of each line of data in the characteristic diagram to obtain a visual classification vector of the video to be classified; or respectively calculating the maximum value of each line of data in the feature map to obtain the visual classification vector of the video to be classified. In particular, each line of data in the feature map may be fused into a single data by other methods, such as calculating the minimum value, and the like, which are not described in detail herein.
Step 103, acquiring a text classification vector of the video to be classified according to the text contained in the image frame of the video to be classified.
In this embodiment, the process of obtaining the text classification vector in step 103 may be performed after obtaining the visual classification vector, as shown in fig. 1; before the visual classification vector is obtained, the process of obtaining the visual classification vector may be performed simultaneously, and the process is not limited herein. The process of obtaining the text classification vector through step 103 includes: extracting image frames from a video to be classified; identifying characters in the image frame to obtain texts contained in the image frame; and combining texts contained in all image frames in the video to be classified, and then classifying the texts to obtain a text classification vector of the video to be classified. The method for combining the texts contained in all the image frames in the video to be classified can be used for splicing the texts contained in all the image frames in the video to be classified to form a long text. The method adopted by the Text recognition may be CRNN (convolutional recurrent Neural Network) or CTPN (connective Text forward Network) connected to a Text candidate box Network; the method adopted for text classification can be TextCNN (algorithm for classifying texts by using convolutional neural network), FastText (fast text classification algorithm) or LSTM (Long-Short term memory, Long-term memory artificial neural network).
And 104, substituting the visual classification vector and the text classification vector into a preset classification model to obtain the category of the video to be classified.
In this embodiment, the preset classification model in step 104 may be generated in advance by using a model such as a neural network. The process of obtaining the category of the video to be classified through step 104 may be: splicing the visual classification vector and the text classification vector to synthesize a line vector to obtain a vector to be classified; and substituting the vectors to be classified into a preset classification model to obtain the category of the video to be classified.
In summary, the video classification method provided by the present invention realizes video classification by respectively obtaining the visual classification vector and the text classification vector, and substituting the visual classification vector and the text classification vector into the classification model. According to the technical scheme provided by the embodiment of the invention, the visual classification vector and the text classification vector are used as the parameters of video classification together, so that the accuracy of video classification is improved, and the problems of low efficiency and accuracy of the conventional video classification method are solved. In addition, the visual classification vector is obtained according to the feature vector of the key frame, and the text classification vector contains deeper semantic information, so that the accuracy of video classification can be further improved.
Example 2
As shown in fig. 2, an embodiment of the present invention provides a video classification method, including:
step 201 to step 203, obtaining the visual classification vector and the text classification vector, which is similar to step 101 to step 103 shown in fig. 1 and is not repeated here.
Step 204, a plurality of video samples are obtained, and a visual classification vector, a text classification vector and a category value corresponding to each video sample are obtained.
Step 205, training the initial classifier according to the visual classification vector, the text classification vector and the class value corresponding to each video sample, respectively, to obtain a classification model.
In this embodiment, the initial classifier in step 205 may adopt a convolutional neural network model, or may adopt other models, which is not limited herein.
And step 206, substituting the visual classification vector and the text classification vector into a preset classification model to obtain the category of the video to be classified. The process is similar to step 104 shown in fig. 1, and is not described in detail here.
In summary, the video classification method provided by the present invention realizes video classification by respectively obtaining the visual classification vector and the text classification vector, and substituting the visual classification vector and the text classification vector into the classification model. According to the technical scheme provided by the embodiment of the invention, the visual classification vector and the text classification vector are used as the parameters of video classification together, so that the accuracy of video classification is improved, and the problems of low efficiency and accuracy of the conventional video classification method are solved. In addition, the visual classification vector is obtained according to the feature vector of the key frame, and the text classification vector contains deeper semantic information, so that the accuracy of video classification can be further improved.
Example 3
As shown in fig. 3, an embodiment of the present invention provides a video classification apparatus, including:
the feature obtaining module 301 is configured to obtain a feature vector of each key frame in the video to be classified according to the key frame in the video to be classified;
the visual classification module 302 is connected with the feature acquisition module and is used for acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified;
the text classification module 303 is configured to obtain a text classification vector of the video to be classified according to a text included in an image frame in the video to be classified;
and the category obtaining module 304 is connected to the visual classification module and the text classification module, respectively, and is configured to substitute the visual classification vector and the text classification vector into a preset classification model to obtain a category of the video to be classified.
In this embodiment, the process of classifying videos through the feature obtaining module 301 to the category obtaining module 304 is similar to that provided in the first embodiment of the present invention, and is not repeated here.
Further, as shown in fig. 4, a visual classification module 302 in the video classification apparatus according to the embodiment of the present invention includes:
the vector combination submodule 3021 is configured to combine feature vectors of all key frames in a video to be classified according to rows to obtain a feature map;
and the vector fusion submodule 3022 is connected to the vector combination submodule, and is configured to fuse each line of data in the feature map into one data, so as to obtain a visual classification vector of the video to be classified.
Further, as shown in fig. 5, the video classification apparatus provided in the embodiment of the present invention further includes:
a sample obtaining module 305, configured to obtain a plurality of video samples, and a visual classification vector, a text classification vector, and a category value corresponding to each video sample;
and the training module 306 is connected to the sample acquisition module and the category acquisition module, and is configured to train the initial classifier according to the visual classification vector, the text classification vector and the category value corresponding to each video sample, so as to obtain a classification model.
When the video classification apparatus provided in this embodiment further includes the sample obtaining module 305 and the training module 306, the process of implementing video classification is similar to that provided in the second embodiment of the present invention, and details are not repeated here.
In summary, the video classification apparatus provided in the present invention realizes video classification by respectively obtaining the visual classification vector and the text classification vector, and substituting the visual classification vector and the text classification vector into the classification model. According to the technical scheme provided by the embodiment of the invention, the visual classification vector and the text classification vector are used as the parameters of video classification together, so that the accuracy of video classification is improved, and the problems of low efficiency and accuracy of the conventional video classification method are solved. In addition, the visual classification vector is obtained according to the feature vector of the key frame, and the text classification vector contains deeper semantic information, so that the accuracy of video classification can be further improved.
Specifically, the video category corresponding to the video sample used in the training is 20, and the 20 training categories include dance, music, gourmet, makeup, dance, sports, manual work, pets, mother and baby, drawing, life, wearing and building, games, animation, fitness, emotion, constellation, travel, digital code and furniture. Corresponding to different training classes, the class values of which respectively correspond to integers between 0 and 19. The process of training the classifier may specifically include:
and extracting N frames (N is more than or equal to 3) of key frames from each video sample, wherein N is 4 as an example. And training the video initial classifier through the extracted key frames and the corresponding class values to obtain a video classification model. The video initial classifier may employ models of Resnet50(Residual Network 50, depth Residual Network 50), Resnet101(Residual Network 101, depth Residual Network 101), Xceptance (depth separable convolution), and the like.
Taking a video sample as an example, the process of obtaining the row vector of each video sample includes:
extracting image frames from the video samples; identifying characters in the image frame to obtain texts contained in the image frame, wherein the identification method can be CRNN, CTPN and the like; and training the initial text classifier according to the texts and the corresponding class values contained in the image frames to obtain a text classification model. The initial text classifier may employ models such as TextCNN, FastText, LSTM, etc.
Acquiring key frames in a video sample, taking extracting the following 4 key frames as an example:
key frame 0, dimension (255, 3);
keyframe 1, dimension (255, 3);
keyframe 2, dimension (255, 3);
key frame 3, dimension (255, 3).
Substituting each key frame in the video sample into the video classification model to obtain a feature vector of each key frame in the video sample, wherein the dimension of each feature vector is (1,2048);
[-1.4759,-0.6063,1.2209,……,0.3973,-0.1676,2.7899]
[-0.7009,-0.4696,1.7640,……,1.1952,1.3861,0.2387]
[1.0831,-1.9600,0.8904,……,0.3973,-0.1676,2.7899]
[0.1322,0.6038,2.6935,……,0.3889,1.4386,1.0443]
combining the 4 eigenvectors according to rows to obtain an eigenvector map
Figure BDA0002257309370000081
Respectively fusing each line of data in the feature map into data, and averaging and fusing each line of data intoFor example, a visual classification vector of dimension (2048,1) [ -0.2404, -0.6080,1.6422, … …,0.4514,0.6358, 1.1756) is obtained]。
Image frames are extracted from the video, and the image frames are obtained by taking the extraction at equal intervals of 1s as an example. Recognizing characters in an image frame to obtain a text 'national goods, highlight, card, heart, hello, great family, i is a makeup musician of a demeanor imitation makeup, i needs to make the best recently, the habit of habitually buying out is a person, i's old swan and i's colorpop are not used up yet, i's old swan and i's colophony are bought of two types of national goods, the reason of purchasing is not unique, i's shell is good-looking, the price is cheap, i's first type fawn, i's second color, rainbow color, powder is particularly high, the fash is wiped out by a brush, the fash is a little rainbow, the fash is a polarized light, the second type, the shell is grown to be more European and American, a laser package is opened, the fash has three colors, the fash is a dark pink bar, the fash is wiped down, the face color is an Ji, and the light eyebrow is wiped, when not making up, people can wipe a little high light, so that five sense organs of people can be more three-dimensional, and people can share the situation at the present flat price. The text is input into a text classification model, and a text classification vector with the dimensionality of (1024,1) [0.0107,0.2644,0.4699, … …,0.5430,0.8514,0.6103] is obtained.
And splicing the visual classification vector and the text classification vector to obtain a line vector with dimension (3072, 1).
And training the initial classifier according to the row vectors and the corresponding class values of all the video samples to obtain a classification model.
Finally, all videos can be classified through the video classification model, the text classification model and the classification model, and the classification process is similar to that provided by the first embodiment of the invention and is not repeated herein.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (7)

1. A method of video classification, comprising:
acquiring a feature vector of each key frame in the video to be classified according to the key frames in the video to be classified;
acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified;
acquiring a text classification vector of the video to be classified according to texts contained in image frames in the video to be classified;
substituting the visual classification vector and the text classification vector into a preset classification model to obtain the category of the video to be classified;
the obtaining of the visual classification vector of the video to be classified includes:
combining the feature vectors of all key frames in the video to be classified according to rows to obtain a feature map;
respectively fusing each line of data in the feature map into one data to obtain a visual classification vector of the video to be classified;
the step of respectively fusing each line of data in the feature map into one data to obtain the visual classification vector of the video to be classified comprises the following steps:
and respectively calculating the maximum value of each line of data in the characteristic diagram to obtain the visual classification vector of the video to be classified.
2. The method of claim 1, wherein before said assigning the visual classification vector and the text classification vector to a preset classification model, further comprising:
acquiring a plurality of video samples, and a visual classification vector, a text classification vector and a category value corresponding to each video sample;
and training the initial classifier according to the visual classification vector, the text classification vector and the class value corresponding to each video sample to obtain the classification model.
3. The method according to any one of claims 1 to 2, wherein the obtaining the feature vector of each key frame in the video to be classified comprises:
respectively extracting the features of each key frame in the video to be classified by using a preset image classifier to obtain the feature vector of each key frame in the video to be classified; alternatively, the first and second electrodes may be,
and acquiring the feature point of each key frame in the video to be classified, and acquiring the feature vector of each key frame in the video to be classified according to the feature points.
4. The method according to any one of claims 1 to 2, wherein the obtaining the feature vector of each key frame in the video to be classified comprises:
extracting key frames from the video to be classified according to a preset rule; the preset rules include: one of duration, weight, interval and click rate;
and acquiring the feature vector of each key frame in the video to be classified.
5. The video classification method according to any one of claims 1 to 2, wherein obtaining the text classification vector of the video to be classified comprises:
extracting image frames from the video to be classified;
identifying characters in the image frame to obtain texts contained in the image frame;
and combining texts contained in all image frames in the video to be classified, and then classifying the texts to obtain a text classification vector of the video to be classified.
6. A video classification apparatus, comprising:
the characteristic acquisition module is used for acquiring a characteristic vector of each key frame in the video to be classified according to the key frames in the video to be classified;
the visual classification module is connected with the feature acquisition module and used for acquiring a visual classification vector of the video to be classified according to the feature vector of each key frame in the video to be classified;
the text classification module is used for acquiring a text classification vector of the video to be classified according to texts contained in image frames in the video to be classified;
the category acquisition module is respectively connected with the visual classification module and the text classification module and used for substituting the visual classification vector and the text classification vector into a preset classification model to acquire the category of the video to be classified;
the visual classification module comprises:
the vector combination submodule is used for combining the feature vectors of all key frames in the video to be classified according to rows to obtain a feature map;
the vector fusion submodule is connected with the vector combination submodule and used for respectively fusing each line of data in the feature map into one data to obtain a visual classification vector of the video to be classified;
the step of respectively fusing each line of data in the feature map into one data to obtain the visual classification vector of the video to be classified comprises the following steps:
and respectively calculating the maximum value of each line of data in the characteristic diagram to obtain the visual classification vector of the video to be classified.
7. The video classification apparatus according to claim 6, wherein the apparatus further comprises:
the system comprises a sample acquisition module, a classification module and a classification module, wherein the sample acquisition module is used for acquiring a plurality of video samples and a visual classification vector, a text classification vector and a category value corresponding to each video sample;
and the training module is respectively connected with the sample acquisition module and the category acquisition module and is used for training the initial classifier according to the visual classification vector, the text classification vector and the category value corresponding to each video sample to obtain the classification model.
CN201911058829.6A 2019-11-01 2019-11-01 Video classification method and device Active CN110879974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911058829.6A CN110879974B (en) 2019-11-01 2019-11-01 Video classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911058829.6A CN110879974B (en) 2019-11-01 2019-11-01 Video classification method and device

Publications (2)

Publication Number Publication Date
CN110879974A CN110879974A (en) 2020-03-13
CN110879974B true CN110879974B (en) 2020-10-13

Family

ID=69728219

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911058829.6A Active CN110879974B (en) 2019-11-01 2019-11-01 Video classification method and device

Country Status (1)

Country Link
CN (1) CN110879974B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488489B (en) * 2020-03-26 2023-10-24 腾讯科技(深圳)有限公司 Video file classification method, device, medium and electronic equipment
CN111556377A (en) * 2020-04-24 2020-08-18 珠海横琴电享科技有限公司 Short video labeling method based on machine learning
CN111859024A (en) * 2020-07-15 2020-10-30 北京字节跳动网络技术有限公司 Video classification method and device and electronic equipment
CN114157906B (en) * 2020-09-07 2024-04-02 北京达佳互联信息技术有限公司 Video detection method, device, electronic equipment and storage medium
CN113159010B (en) * 2021-03-05 2022-07-22 北京百度网讯科技有限公司 Video classification method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241729A (en) * 2017-09-28 2018-07-03 新华智云科技有限公司 Screen the method and apparatus of video
CN110019950A (en) * 2019-03-22 2019-07-16 广州新视展投资咨询有限公司 Video recommendation method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222101A (en) * 2011-06-22 2011-10-19 北方工业大学 Method for video semantic mining
US9652675B2 (en) * 2014-07-23 2017-05-16 Microsoft Technology Licensing, Llc Identifying presentation styles of educational videos
CN104657468B (en) * 2015-02-12 2018-07-31 中国科学院自动化研究所 The rapid classification method of video based on image and text
CN108763325B (en) * 2018-05-04 2019-10-01 北京达佳互联信息技术有限公司 A kind of network object processing method and processing device
CN109660865B (en) * 2018-12-17 2021-09-21 杭州柚子街信息科技有限公司 Method and device for automatically labeling videos, medium and electronic equipment
CN110162669B (en) * 2019-04-04 2021-07-02 腾讯科技(深圳)有限公司 Video classification processing method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241729A (en) * 2017-09-28 2018-07-03 新华智云科技有限公司 Screen the method and apparatus of video
CN110019950A (en) * 2019-03-22 2019-07-16 广州新视展投资咨询有限公司 Video recommendation method and device

Also Published As

Publication number Publication date
CN110879974A (en) 2020-03-13

Similar Documents

Publication Publication Date Title
CN110879974B (en) Video classification method and device
Chen et al. Sketchygan: Towards diverse and realistic sketch to image synthesis
Hong et al. Inferring semantic layout for hierarchical text-to-image synthesis
CN110166827B (en) Video clip determination method and device, storage medium and electronic device
Kim et al. Dense relational captioning: Triple-stream networks for relationship-based captioning
Yu et al. Semantic jitter: Dense supervision for visual comparisons via synthetic images
Romero et al. Smit: Stochastic multi-label image-to-image translation
Tian et al. Query-dependent aesthetic model with deep learning for photo quality assessment
CN109635680B (en) Multitask attribute identification method and device, electronic equipment and storage medium
CN110781347A (en) Video processing method, device, equipment and readable storage medium
CN110083729B (en) Image searching method and system
WO2012167568A1 (en) Video advertisement broadcasting method, device and system
Dimitropoulos et al. Classification of multidimensional time-evolving data using histograms of grassmannian points
Huang et al. ReVersion: Diffusion-based relation inversion from images
Wang et al. Early action prediction with generative adversarial networks
CN112102157A (en) Video face changing method, electronic device and computer readable storage medium
Hao et al. Vico: Detail-preserving visual condition for personalized text-to-image generation
Wang et al. From attribute-labels to faces: face generation using a conditional generative adversarial network
WO2023056835A1 (en) Video cover generation method and apparatus, and electronic device and readable medium
CN113515669A (en) Data processing method based on artificial intelligence and related equipment
CN114332466B (en) Continuous learning method, system, equipment and storage medium for image semantic segmentation network
More et al. Seamless nudity censorship: an image-to-image translation approach based on adversarial training
Song et al. Hierarchical LSTMs with adaptive attention for visual captioning
Tzelepis et al. Video aesthetic quality assessment using kernel Support Vector Machine with isotropic Gaussian sample uncertainty (KSVM-IGSU)
CN115222858A (en) Method and equipment for training animation reconstruction network and image reconstruction and video reconstruction thereof

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant