CN110287788A - A kind of video classification methods and device - Google Patents
A kind of video classification methods and device Download PDFInfo
- Publication number
- CN110287788A CN110287788A CN201910432237.XA CN201910432237A CN110287788A CN 110287788 A CN110287788 A CN 110287788A CN 201910432237 A CN201910432237 A CN 201910432237A CN 110287788 A CN110287788 A CN 110287788A
- Authority
- CN
- China
- Prior art keywords
- video
- feature
- sorted
- text
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 238000013528 artificial neural network Methods 0.000 claims description 43
- 230000007787 long-term memory Effects 0.000 claims description 25
- 230000011218 segmentation Effects 0.000 claims description 22
- 238000003860 storage Methods 0.000 claims description 16
- 230000000007 visual effect Effects 0.000 claims description 14
- 230000015654 memory Effects 0.000 claims description 10
- 238000013145 classification model Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 2
- 230000006403 short-term memory Effects 0.000 claims 2
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 6
- 238000012512 characterization method Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Acoustics & Sound (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of video classification methods and devices, this method comprises: the feature for the video frame for including according to video to be sorted, obtains the characteristics of image of the video to be sorted;Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained;According to the feature of the text information for describing the video to be sorted, the text feature of the video to be sorted is obtained;Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.Using scheme provided in an embodiment of the present invention, it can be classified using the characteristics of image, phonetic feature and text feature of video to video, compared to the mode classified in the related technology by artificial mode to video, human resources can be saved, reduce the cost classified to video.
Description
Technical field
The present invention relates to computer application technology, in particular to a kind of video classification methods and device.
Background technique
Video service provider is when providing a user Video service, it is often necessary to first according to the content of video to video into
Row classification.For example, one section of section of football match video can be divided into " sport " classification, and the video of one section of natural land then may be used
To be divided into " tourism " classification.By classifying to video, video service provider can be provided preferably to client
Service.For example, sub-category displaying video on webpage or in mobile terminal application (APP), can help user more quickly
Interested video is found, and is classified to video also provided convenience for the search of video and recommendation in advance.
It in the related technology, is by manually being watched video content and being audited, thus real when classifying to video
Existing visual classification.And when being classified by artificial mode to video, when the video number for needing to classify is larger, video clothes
Business provider then needs to spend a large amount of human cost.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of video classification methods and devices.It is described
Technical solution is as follows:
In a first aspect, providing a kind of video classification methods, which comprises
According to the feature for the video frame that video to be sorted includes, the characteristics of image of the video to be sorted is obtained;
Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained;
According to the feature of the text information for describing the video to be sorted, the text for obtaining the video to be sorted is special
Sign;
Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.
Optionally, the feature of the video frame for including according to video to be sorted obtains the image of the video to be sorted
The step of feature, comprising:
Obtain the image high dimensional feature for each video frame that the video to be sorted includes;
According to the playing sequence of the video frame, the view to be sorted is determined using the image high dimensional feature of each video frame
The first kind characteristics of image of frequency;
The image high dimensional feature of each video frame is clustered, the second class image for obtaining the video to be sorted is special
Sign;
Using the first kind characteristics of image and the second class characteristics of image as described image feature.
Optionally, the playing sequence according to the video frame is determined using the image high dimensional feature of each video frame
The step of first kind characteristics of image of the video to be sorted, comprising:
According to the playing sequence of the video frame, the image high dimensional feature of each video frame is input to the of training in advance
One shot and long term Memory Neural Networks;
The output of the first shot and long term Memory Neural Networks is obtained as a result, obtaining the First Kind Graph of the video to be sorted
As feature.
Optionally, the feature of the audio data for including using the video to be sorted, obtains the video to be sorted
Phonetic feature the step of, comprising:
It is moved according to scheduled frame length and frame and the audio data that the video to be sorted includes is divided into multiple audio frames;
The feature of each audio frame is extracted using scheduled Speech processing algorithm;
According to the playing sequence of each audio frame, the view to be sorted is obtained using the feature of extracted each audio frame
The audio frequency characteristics of frequency.
Optionally, the playing sequence according to each audio frame is obtained using the feature of extracted each audio frame
The step of audio frequency characteristics of the video to be sorted, comprising:
According to the playing sequence of each audio frame, the feature of extracted each audio frame is input to the of training in advance
Two shot and long term Memory Neural Networks;
The output of the second shot and long term Memory Neural Networks is obtained as a result, the audio for obtaining the video to be sorted is special
Sign.
Optionally, the basis is used to describe the feature of the text information of the video to be sorted, obtains described to be sorted
The step of text feature of video, comprising:
The text information is segmented to obtain word segmentation result;
For each word for including in the word segmentation result, using word embedded technology neural network based, obtain every
The corresponding term vector of one word;
According to appearance sequence of each word in the word segmentation result in the text information, the word of each word is utilized
Vector determines the first kind text feature of the text information;
Dimension-reduction treatment is carried out to the term vector of each word for including in the word segmentation result, obtains the of the text information
Two class text features;
Using the first kind text feature and the second class text feature as the text feature.
Optionally, appearance sequence of each word according in the word segmentation result in the text information, utilizes
The term vector of each word determines the step of first kind text feature of the text information, comprising:
According to each word in the word segmentation result in the text information appearance sequence, by the word of each word to
Amount is input to third shot and long term Memory Neural Networks trained in advance;
The output of the third shot and long term Memory Neural Networks is obtained as a result, obtaining the first class text of the text information
Feature.
Optionally, described to be determined belonging to the video to be sorted using described image feature, phonetic feature and text feature
Classification the step of, comprising:
Described image feature, phonetic feature and text feature are input to neural network classification model trained in advance;
The output of the neural network classification model is obtained as a result, obtaining classification belonging to the video to be sorted.
Second aspect provides a kind of visual classification device, comprising:
Module is obtained, the feature of the video frame for including according to video to be sorted obtains the figure of the video to be sorted
As feature;
Module is obtained, the feature of the audio data for including using the video to be sorted obtains the view to be sorted
The phonetic feature of frequency;
Acquisition module obtains described wait divide for the feature according to the text information for describing the video to be sorted
The text feature of class video;
Determining module, for determining the video institute to be sorted using described image feature, phonetic feature and text feature
The classification of category.
The third aspect provides a kind of computer equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes, so that at least one described processor is able to carry out video classification methods described in any of the above embodiments.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
Computer program, the computer program realize video classification methods described in any of the above embodiments when being executed by processor.
Video classification methods and device provided in an embodiment of the present invention, the video frame that can include according to video to be sorted
Feature obtains the characteristics of image of video to be sorted;Using the feature for the audio data that video to be sorted includes, view to be sorted is obtained
The phonetic feature of frequency;According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained;
Classification belonging to video to be sorted is determined using characteristics of image, phonetic feature and text feature.It is provided using the embodiment of the present invention
Scheme, can be classified using the characteristics of image, phonetic feature and text feature of video to video, compared to related skill
The mode classified by artificial mode to video in art can save human resources, what reduction classified to video
Cost.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow diagram of video classification methods provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram for obtaining video image characteristic method provided in an embodiment of the present invention;
Fig. 3 is a kind of flow diagram for obtaining video speech characterization method provided in an embodiment of the present invention;
Fig. 4 is a kind of flow diagram for obtaining videotext characterization method provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of visual classification device provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
Below in conjunction with specific embodiment, the process of video classification methods shown in FIG. 1 is described in detail, it is interior
Appearance can be such that
Step 100, the feature for the video frame for including according to video to be sorted obtains the characteristics of image of video to be sorted.
In an implementation, video is made of each video frame, the spy for the video that the feature of each video frame also just characterizes
Sign is based on this, the feature for the video frame that can include according to video to be sorted, to obtain the characteristics of image of video to be sorted.
It is a kind of acquisition video image characteristic method provided in an embodiment of the present invention referring to fig. 2 in a kind of implementation
Flow diagram, comprising:
Step 20, the image high dimensional feature for each video frame that video to be sorted includes is obtained.
Image high dimensional feature usually extracts from specific image, for example extracts from video frame.Image high dimensional feature is image
A kind of form of expression of information, image high dimensional feature have floating type feature, also there is the binary features of high latitude.
In an implementation, each video frame can be input in the convolutional neural networks of pre-training, to obtain each view
The image high dimensional feature of frequency frame.Specifically, above-mentioned convolutional neural networks can be with are as follows: ResNet (Residual Neural
Network, residual error neural network), VGG (Visual Geometry Group Network, visual geometric group network) etc..
Step 21, according to the playing sequence of video frame, view to be sorted is determined using the image high dimensional feature of each video frame
The first kind characteristics of image of frequency.
In an implementation, according to the playing sequence of video frame, view to be sorted is obtained using the image high dimensional feature of video frame
The characteristics of image of frequency, due to the playing sequence of the video frame fully considered, the characteristics of image because obtained from has timing,
The accuracy of visual classification is improved with this.
It, can be special by the image higher-dimension of each video frame according to the playing sequence of the video frame in a kind of implementation
Sign is input to the first shot and long term Memory Neural Networks of training in advance;Obtain the output knot of the first shot and long term Memory Neural Networks
Fruit, to obtain the first kind characteristics of image of video to be sorted.
It is handled using image high dimensional feature of the shot and long term Memory Neural Networks to each video frame, the obtained first kind
Characteristics of image is the feature for having timing information, that is, it is subsequent classify to video when can fully consider characteristics of image
In timing information, thus improve subsequent video classification accuracy.
Step 22, the image high dimensional feature of each video frame is clustered, obtains the second class image of video to be sorted
Feature.
It is clustered by the image high dimensional feature of the video frame contained to video bag to be sorted, the high image of similarity is high
Dimensional feature is classified as one kind, to obtain to characterize the second class characteristics of image of video to be sorted.
Specifically, predetermined quantity image high dimensional feature can be randomly selected as cluster centre, a cluster centre
Just represent a class cluster;
Then it is directed to each image high dimensional feature, calculates the similarity between image high dimensional feature and each cluster centre,
And image high position feature is added in class cluster corresponding with the highest cluster centre of its similarity according to similarity calculated,
Also predetermined quantity class cluster has just been obtained;
The average characteristics of such cluster are calculated according to the image high dimensional feature in such cluster then for each class cluster, are had
Body, average characteristics can be the average value for the image high dimensional feature for including in class cluster, and the average characteristics being calculated with
In the case that the cluster centre of such cluster is different, the cluster centre of such cluster is updated to the average characteristics being calculated;
It returns to for each image high dimensional feature, calculates similar between image high dimensional feature and any cluster centre
Degree, and image high position feature is added to by class cluster corresponding with the highest cluster centre of its similarity according to similarity calculated
In process, until when the average characteristics of each class cluster are identical as the cluster centre of such cluster, the class cluster that will cluster at this time
As cluster result, the second class characteristics of image of obtained cluster result i.e. video to be sorted.
Step 23, using first kind characteristics of image and the second class characteristics of image as characteristics of image.
Step 110, the feature for the audio data for including using video to be sorted obtains the phonetic feature of video to be sorted.
In an implementation, the audio data that different video is included also is different, thus can use the spy of audio data
Sign is to obtain the feature of video.
It is a kind of acquisition video speech characterization method provided in an embodiment of the present invention referring to Fig. 3 in a kind of implementation
Flow diagram, comprising:
Step 30, it is moved according to scheduled frame length and frame and the audio data that video to be sorted includes is divided into multiple audios
Frame.
The playing duration for each audio frame that above-mentioned frame length namely divides.
Play time between above-mentioned frame shifting i.e. adjacent two frames audio frame is poor, for example, adjacent two frames audio frame is broadcast
Put the time be respectively as follows: 9 points 30 seconds -10 points 0 second;9 points 40 seconds -10 points 10 seconds, correspondingly, frame, which moves, then subtracts 9 point 30 in 40 seconds for 9 points
Second is equal to 10 seconds.
Step 31, the feature of each audio frame is extracted using scheduled Speech processing algorithm.
Specifically, can use mel-frequency cepstrum coefficient algorithm to extract the feature of each audio frame.
Step 32, it according to the playing sequence of each audio frame, is obtained using the feature of extracted each audio frame wait divide
The audio frequency characteristics of class video.
In an implementation, according to the playing sequence of audio frame, the audio of video to be sorted is obtained using the feature of audio frame
Feature, due to the playing sequence of the audio frame fully considered, the audio frequency characteristics because obtained from have timing, are mentioned with this
The accuracy of high visual classification.
The sequencing of the starting play time of the playing sequence i.e. each audio frame of above-mentioned each audio frame.
It, can be according to the playing sequence of each audio frame, by the spy of extracted each audio frame in a kind of implementation
Sign is input to the second shot and long term Memory Neural Networks of training in advance;Then the output of the second shot and long term Memory Neural Networks is obtained
As a result, to obtain the audio frequency characteristics of video to be sorted.
It is handled using feature of the shot and long term Memory Neural Networks to each audio frame, obtained audio frequency characteristics are that have
The feature of timing information, that is, it is subsequent classify to video when can fully consider timing information in audio frequency characteristics,
To improve the accuracy of subsequent video classification.
Step 120, according to the feature of the text information for describing video to be sorted, the text for obtaining video to be sorted is special
Sign.
The information that text information is namely introduced video to be sorted generally includes the title of video to be sorted and interior
Hold summary.
It is a kind of acquisition videotext characterization method provided in an embodiment of the present invention referring to fig. 4 in a kind of implementation
Flow diagram, comprising:
Step 40, text information is segmented to obtain word segmentation result.
In an implementation, it can use Forward Maximum Method algorithm, reverse maximum matching algorithm or two-way maximum matching to calculate
Method segments text information.
Step 41, it is obtained for each word for including in word segmentation result using word embedded technology neural network based
The corresponding term vector of each word.
Specifically, can use frequency of occurrence of each word in text information and pre-establish based on neural network
Probabilistic model construct the term vector of each word.
Step 42, the appearance sequence according to each word in word segmentation result in text information, utilizes the word of each word
Vector determines the first kind text feature of text information.
In an implementation, according to each word in text information appearance sequence, obtained using the term vector of each word to
The text feature of classification video, due to the appearance sequence of the term vector fully considered, the text feature because obtained from is when having
Sequence, the accuracy of visual classification is improved with this.
, can be according to the sequence of the appearance in text information of each word in word segmentation result in a kind of implementation, it will be every
The term vector of one word is input to third shot and long term Memory Neural Networks trained in advance;Then third shot and long term memory mind is obtained
Output through network is as a result, to obtain the first kind text feature of text information.
It is handled using term vector of the shot and long term Memory Neural Networks to each word, obtained first kind text feature
Be have the feature of timing information, that is, it is subsequent classify to video when can fully consider timing in text feature
Information, to improve the accuracy of subsequent video classification.
Step 43, dimension-reduction treatment is carried out to the term vector of each word for including in word segmentation result, obtains the of text information
Two class text features.
Dimension-reduction treatment is carried out to term vector, that is, reduces the dimension of each term vector, to improve subsequent carry out video
The efficiency of classification.
Specifically, can use local feature aggregation description subalgorithm to handle each term vector, to reduce each word
The dimension of vector.
Step 44, using first kind text feature and the second class text feature as the text feature.
Step 130, classification belonging to video to be sorted is determined using characteristics of image, phonetic feature and text feature.
Characteristics of image, phonetic feature possessed by video and text feature also just characterize the feature of a video, a kind of
In implementation, video library, characteristics of image, the language of the video comprising each classification and video in video library can be pre-established
Sound feature and text feature.Correspondingly, can use the image of video to be sorted when needing to classify to video to be sorted
Feature, phonetic feature and text feature are matched with the characteristics of image of video, phonetic feature and text feature in video library, are looked for
To the highest video of similarity, the classification of the video found i.e. video to be sorted between each feature of video to be sorted
Classification.
In another implementation, the classification of video can use as supervision message, characteristics of image, the voice of video are special
Text feature of seeking peace is trained neural network as sample to obtain neural network classification model.Correspondingly, needing
When classifying to video to be sorted, the characteristics of image, phonetic feature and text feature of video to be sorted can be input to pre-
First trained neural network classification model;The output of neural network classification model is obtained as a result, to obtain video institute to be sorted
The classification of category.
Video classification methods provided in an embodiment of the present invention, the feature for the video frame that can include according to video to be sorted,
Obtain the characteristics of image of video to be sorted;Using the feature for the audio data that video to be sorted includes, video to be sorted is obtained
Phonetic feature;According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained;It utilizes
Characteristics of image, phonetic feature and text feature determine classification belonging to video to be sorted.Using side provided in an embodiment of the present invention
Case can classify to video using the characteristics of image, phonetic feature and text feature of video, compared in the related technology
The mode classified by artificial mode to video can save human resources, reduce the cost classified to video.
And when classifying to video, the timing information of each feature possessed by video can be fully considered, to improve view
The accuracy of frequency division class.
Based on the same technical idea, the embodiment of the invention also provides a kind of visual classification devices, as shown in figure 5, institute
Stating device includes:
Module 500 is obtained, the feature of the video frame for including according to video to be sorted obtains the video to be sorted
Characteristics of image;
Module 510 is obtained, the feature of the audio data for including using the video to be sorted obtains described to be sorted
The phonetic feature of video;
Obtain module 520, for the feature according to the text information for describing the video to be sorted, obtain it is described to
The text feature of classification video;
Determining module 530, for determining the video to be sorted using described image feature, phonetic feature and text feature
Affiliated classification.
Optionally, the acquisition module 500, comprising:
Acquisition submodule, for obtaining the image high dimensional feature for each video frame that the video to be sorted includes;
First determines that submodule utilizes the image higher-dimension of each video frame for the playing sequence according to the video frame
Feature determines the first kind characteristics of image of the video to be sorted;
First obtains submodule, clusters for the image high dimensional feature to each video frame, obtains described to be sorted
Second class characteristics of image of video;
First is used as submodule, for using the first kind characteristics of image and the second class characteristics of image as described image spy
Sign.
Optionally, the determining submodule, is specifically used for,
According to the playing sequence of the video frame, the image high dimensional feature of each video frame is input to the of training in advance
One shot and long term Memory Neural Networks;
The output of the first shot and long term Memory Neural Networks is obtained as a result, obtaining the First Kind Graph of the video to be sorted
As feature.
It is optionally, described to obtain module 510, comprising:
Submodule is divided, is divided for moving the audio data for including by the video to be sorted according to scheduled frame length and frame
For multiple audio frames;
Extracting sub-module, for extracting the feature of each audio frame using scheduled Speech processing algorithm;
Second obtains submodule, for the playing sequence according to each audio frame, utilizes extracted each audio frame
Feature obtains the audio frequency characteristics of the video to be sorted.
Optionally, described second submodule is obtained, be specifically used for
According to the playing sequence of each audio frame, the feature of extracted each audio frame is input to the of training in advance
Two shot and long term Memory Neural Networks;
The output of the second shot and long term Memory Neural Networks is obtained as a result, the audio for obtaining the video to be sorted is special
Sign.
Optionally, the acquisition module 520, comprising:
Submodule is segmented, for being segmented to obtain word segmentation result to the text information;
Submodule is constructed, for using word neural network based for each word for including in the word segmentation result
Embedded technology obtains the corresponding term vector of each word;
Second determines submodule, for suitable according to appearance of each word in the text information in the word segmentation result
Sequence determines the first kind text feature of the text information using the term vector of each word;
Third obtains submodule, carries out dimension-reduction treatment for the term vector to each word for including in the word segmentation result,
Obtain the second class text feature of the text information;
Second is used as submodule, for the first kind text feature and the second class text feature is special as the text
Sign.
Optionally, it described second determines submodule, is specifically used for
According to each word in the word segmentation result in the text information appearance sequence, by the word of each word to
Amount is input to third shot and long term Memory Neural Networks trained in advance;
The output of the third shot and long term Memory Neural Networks is obtained as a result, obtaining the first class text of the text information
Feature.
Optionally, the determining module, is specifically used for
Described image feature, phonetic feature and text feature are input to neural network classification model trained in advance;
The output of the neural network classification model is obtained as a result, obtaining classification belonging to the video to be sorted.
Visual classification device provided in an embodiment of the present invention, the feature for the video frame that can include according to video to be sorted,
Obtain the characteristics of image of video to be sorted;Using the feature for the audio data that video to be sorted includes, video to be sorted is obtained
Phonetic feature;According to the feature of the text information for describing video to be sorted, the text feature of classified video is obtained;It utilizes
Characteristics of image, phonetic feature and text feature determine classification belonging to video to be sorted.Using side provided in an embodiment of the present invention
Case can classify to video using the characteristics of image, phonetic feature and text feature of video, compared in the related technology
The mode classified by artificial mode to video can save human resources, reduce the cost classified to video.
And when classifying to video, the timing information of each feature possessed by video can be fully considered, to improve view
The accuracy of frequency division class.
Fig. 6 is the structural schematic diagram of computer equipment provided in an embodiment of the present invention.The computer equipment 600 can be because of configuration
Or performance is different and generate bigger difference, may include one or more central processing units 622 (for example, one or one
A above processor) and memory 632, the storage medium 630 of one or more storage application programs 642 or data 644
(such as one or more mass memory units).Wherein, memory 632 and storage medium 630 can be it is of short duration storage or
Persistent storage.The program for being stored in storage medium 630 may include one or more modules (diagram does not mark), Mei Gemo
Block may include to the series of instructions operation in computer equipment 600.Further, central processing unit 622 can be set
To communicate with storage medium 630, the series of instructions operation in storage medium 630 is executed in computer equipment 600.
Computer equipment 600 can also include one or more power supplys 624, one or more are wired or wireless
Network interface 650, one or more input/output interfaces 658, one or more keyboards 654, and/or, one or
More than one operating system 641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Etc..
Computer equipment 600 may include having memory and one or more than one computer program, wherein one
A perhaps more than one computer program is stored in memory and is configured to be executed by one or more than one processor
The one or more computer program to realize above-mentioned domain name analytic method, specifically, this method comprises:
According to the feature for the video frame that video to be sorted includes, the characteristics of image of the video to be sorted is obtained;
Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained;
According to the feature of the text information for describing the video to be sorted, the text for obtaining the video to be sorted is special
Sign;
Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
It should be understood that visual classification device provided by the above embodiment, computer equipment and computer-readable depositing
Storage media is when carrying out visual classification, only the example of the division of the above functional modules, in practical application, Ke Yigen
Above-mentioned function distribution is completed by different functional modules according to needs, i.e., the internal structure of device is divided into different functions
Module, to complete all or part of the functions described above.In addition, visual classification device provided by the above embodiment, calculating
The embodiment of machine equipment and computer readable storage medium and video classification methods belongs to same design, implements process
It is detailed in embodiment of the method, which is not described herein again.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (11)
1. a kind of video classification methods, which is characterized in that the described method includes:
According to the feature for the video frame that video to be sorted includes, the characteristics of image of the video to be sorted is obtained;
Using the feature for the audio data that the video to be sorted includes, the phonetic feature of the video to be sorted is obtained;
According to the feature of the text information for describing the video to be sorted, the text feature of the video to be sorted is obtained;
Classification belonging to the video to be sorted is determined using described image feature, phonetic feature and text feature.
2. the method as described in claim 1, which is characterized in that the feature of the video frame for including according to video to be sorted,
The step of obtaining the characteristics of image of the video to be sorted, comprising:
Obtain the image high dimensional feature for each video frame that the video to be sorted includes;
According to the playing sequence of the video frame, the video to be sorted is determined using the image high dimensional feature of each video frame
First kind characteristics of image;
The image high dimensional feature of each video frame is clustered, the second class characteristics of image of the video to be sorted is obtained;
Using the first kind characteristics of image and the second class characteristics of image as described image feature.
3. method according to claim 2, which is characterized in that the playing sequence according to the video frame, utilization are each
The image high dimensional feature of video frame determines the step of first kind characteristics of image of the video to be sorted, comprising:
According to the playing sequence of the video frame, the image high dimensional feature of each video frame is input to the first length of training in advance
Short-term memory neural network;
The output of the first shot and long term Memory Neural Networks is obtained as a result, the First Kind Graph picture for obtaining the video to be sorted is special
Sign.
4. the method as described in claim 1, which is characterized in that the audio data for including using the video to be sorted
Feature, the step of obtaining the phonetic feature of the video to be sorted, comprising:
It is moved according to scheduled frame length and frame and the audio data that the video to be sorted includes is divided into multiple audio frames;
The feature of each audio frame is extracted using scheduled Speech processing algorithm;
According to the playing sequence of each audio frame, the video to be sorted is obtained using the feature of extracted each audio frame
Audio frequency characteristics.
5. method as claimed in claim 4, which is characterized in that the playing sequence according to each audio frame, using being mentioned
The feature of each audio frame taken obtains the step of audio frequency characteristics of the video to be sorted, comprising:
According to the playing sequence of each audio frame, the feature of extracted each audio frame is input to the second length of training in advance
Short-term memory neural network;
The output of the second shot and long term Memory Neural Networks is obtained as a result, obtaining the audio frequency characteristics of the video to be sorted.
6. the method as described in claim 1, which is characterized in that the basis is used to describe the text envelope of the video to be sorted
The step of feature of breath, the text feature of the acquisition video to be sorted, comprising:
The text information is segmented to obtain word segmentation result;
Each is obtained using word embedded technology neural network based for each word for including in the word segmentation result
The corresponding term vector of word;
According to appearance sequence of each word in the word segmentation result in the text information, the term vector of each word is utilized
Determine the first kind text feature of the text information;
Dimension-reduction treatment is carried out to the term vector for each word for including in the word segmentation result, obtains the second class of the text information
Text feature;
Using the first kind text feature and the second class text feature as the text feature.
7. method as claimed in claim 6, which is characterized in that each word according in the word segmentation result is in the text
Appearance sequence in this information, the step of the first kind text feature of the text information is determined using the term vector of each word
Suddenly, comprising:
It is according to appearance sequence of each word in the word segmentation result in the text information, the term vector of each word is defeated
Enter to third shot and long term Memory Neural Networks trained in advance;
The output of the third shot and long term Memory Neural Networks is obtained as a result, the first class text for obtaining the text information is special
Sign.
8. the method according to claim 1 to 7, which is characterized in that described to utilize described image feature, phonetic feature
The step of determining classification belonging to the video to be sorted with text feature, comprising:
Described image feature, phonetic feature and text feature are input to neural network classification model trained in advance;
The output of the neural network classification model is obtained as a result, obtaining classification belonging to the video to be sorted.
9. a kind of visual classification device, which is characterized in that described device includes:
Module is obtained, the feature of the video frame for including according to video to be sorted obtains the image spy of the video to be sorted
Sign;
Module is obtained, the feature of the audio data for including using the video to be sorted obtains the video to be sorted
Phonetic feature;
Module is obtained, for the feature according to the text information for describing the video to be sorted, obtains the view to be sorted
The text feature of frequency;
Determining module, for being determined belonging to the video to be sorted using described image feature, phonetic feature and text feature
Classification.
10. a kind of computer equipment characterized by comprising
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
It manages device to execute, so that at least one described processor is able to carry out the described in any item video classification methods of claim 1-8.
11. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
Program realizes claim 1-8 described in any item video classification methods when the computer program is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910432237.XA CN110287788A (en) | 2019-05-23 | 2019-05-23 | A kind of video classification methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910432237.XA CN110287788A (en) | 2019-05-23 | 2019-05-23 | A kind of video classification methods and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287788A true CN110287788A (en) | 2019-09-27 |
Family
ID=68002274
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910432237.XA Pending CN110287788A (en) | 2019-05-23 | 2019-05-23 | A kind of video classification methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287788A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866563A (en) * | 2019-11-20 | 2020-03-06 | 咪咕文化科技有限公司 | Similar video detection and recommendation method, electronic device and storage medium |
CN111209970A (en) * | 2020-01-08 | 2020-05-29 | Oppo(重庆)智能科技有限公司 | Video classification method and device, storage medium and server |
CN111222011A (en) * | 2020-01-06 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Video vector determination method and device |
CN112464857A (en) * | 2020-12-07 | 2021-03-09 | 深圳市欢太科技有限公司 | Video classification model training and video classification method, device, medium and equipment |
WO2021088510A1 (en) * | 2019-11-05 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Video classification method and apparatus, computer, and readable storage medium |
CN115131709A (en) * | 2022-06-30 | 2022-09-30 | 北京百度网讯科技有限公司 | Video category prediction method, and training method and device of video category prediction model |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN108647571A (en) * | 2018-03-30 | 2018-10-12 | 国信优易数据有限公司 | Video actions disaggregated model training method, device and video actions sorting technique |
CN109359636A (en) * | 2018-12-14 | 2019-02-19 | 腾讯科技(深圳)有限公司 | Video classification methods, device and server |
CN109710800A (en) * | 2018-11-08 | 2019-05-03 | 北京奇艺世纪科技有限公司 | Model generating method, video classification methods, device, terminal and storage medium |
CN109753985A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Video classification methods and device |
-
2019
- 2019-05-23 CN CN201910432237.XA patent/CN110287788A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106779073A (en) * | 2016-12-27 | 2017-05-31 | 西安石油大学 | Media information sorting technique and device based on deep neural network |
CN109753985A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Video classification methods and device |
CN108647571A (en) * | 2018-03-30 | 2018-10-12 | 国信优易数据有限公司 | Video actions disaggregated model training method, device and video actions sorting technique |
CN109710800A (en) * | 2018-11-08 | 2019-05-03 | 北京奇艺世纪科技有限公司 | Model generating method, video classification methods, device, terminal and storage medium |
CN109359636A (en) * | 2018-12-14 | 2019-02-19 | 腾讯科技(深圳)有限公司 | Video classification methods, device and server |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021088510A1 (en) * | 2019-11-05 | 2021-05-14 | 腾讯科技(深圳)有限公司 | Video classification method and apparatus, computer, and readable storage medium |
CN110866563A (en) * | 2019-11-20 | 2020-03-06 | 咪咕文化科技有限公司 | Similar video detection and recommendation method, electronic device and storage medium |
CN110866563B (en) * | 2019-11-20 | 2022-04-29 | 咪咕文化科技有限公司 | Similar video detection and recommendation method, electronic device and storage medium |
CN111222011A (en) * | 2020-01-06 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Video vector determination method and device |
CN111222011B (en) * | 2020-01-06 | 2023-11-14 | 腾讯科技(深圳)有限公司 | Video vector determining method and device |
CN111209970A (en) * | 2020-01-08 | 2020-05-29 | Oppo(重庆)智能科技有限公司 | Video classification method and device, storage medium and server |
CN111209970B (en) * | 2020-01-08 | 2023-04-25 | Oppo(重庆)智能科技有限公司 | Video classification method, device, storage medium and server |
CN112464857A (en) * | 2020-12-07 | 2021-03-09 | 深圳市欢太科技有限公司 | Video classification model training and video classification method, device, medium and equipment |
CN115131709A (en) * | 2022-06-30 | 2022-09-30 | 北京百度网讯科技有限公司 | Video category prediction method, and training method and device of video category prediction model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287788A (en) | A kind of video classification methods and device | |
CN103956169B (en) | A kind of pronunciation inputting method, device and system | |
CN102405495B (en) | Audio classification for information retrieval using sparse features | |
WO2020024556A1 (en) | Music quality evaluation method and apparatus, and computer device and storage medium | |
US9009054B2 (en) | Program endpoint time detection apparatus and method, and program information retrieval system | |
CN110347872B (en) | Video cover image extraction method and device, storage medium and electronic equipment | |
CN110619051B (en) | Question sentence classification method, device, electronic equipment and storage medium | |
CN111046225B (en) | Audio resource processing method, device, equipment and storage medium | |
CN110162621A (en) | Disaggregated model training method, abnormal comment detection method, device and equipment | |
CN107305566B (en) | A kind of method and device to search for information matches picture | |
WO2009061434A1 (en) | System and method for processing digital media | |
CN111783712A (en) | Video processing method, device, equipment and medium | |
WO2018119593A1 (en) | Statement recommendation method and device | |
CN108521618A (en) | Audio frequency playing method and device | |
CN111816170B (en) | Training of audio classification model and garbage audio recognition method and device | |
CN111984749A (en) | Method and device for ordering interest points | |
CN111090771A (en) | Song searching method and device and computer storage medium | |
CN113327628A (en) | Audio processing method and device, readable medium and electronic equipment | |
CN114363695B (en) | Video processing method, device, computer equipment and storage medium | |
CN110378190B (en) | Video content detection system and detection method based on topic identification | |
CN116665083A (en) | Video classification method and device, electronic equipment and storage medium | |
CN109992679A (en) | A kind of classification method and device of multi-medium data | |
CN110516086B (en) | Method for automatically acquiring movie label based on deep neural network | |
US11410706B2 (en) | Content pushing method for display device, pushing device and display device | |
CN112446214A (en) | Method, device and equipment for generating advertisement keywords and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190927 |